Ecbiz97 Filesystem Incident

Update – 03/13/2015: The recent failover plan that we mentioned in this announcement for our ecbiz97 server has been completed successfully. All account data from March 5th has been fully recovered. Please note: some accounts have not completed restoration, typically due to their size. If you find that you are missing files, they will re-appear in your account shortly, and this is no cause for alarm. For those who opted to bring information from the temporary server, these request have been completed as well.

As you may be aware, ecbiz97, the server housing your account encountered a serious issue Thursday afternoon March 5th which resulted in a prolonged service interruption well beyond what we initially expected. While we have made updates on our status pages, support center, and social media sites, we would like to take the time to fully disclose what happened on ecbiz97. The purpose of this article is to provide more details into what specifically occurred, what you can expect over the next few days, as well as what we are doing to help prevent problems like this from occurring in the future.

The Details

On Wednesday evening March 4th, our monitoring team reported extreme sluggishness on the server which was initially traced back to a malfunctioning hard drive. The hard drive was replaced, and the problem subsided until the next morning when it was reported that the latency had returned. Further inspection showed the RAID card on the system was also not functioning properly, and an emergency maintenance window was authorized by our Architecture team to have it replaced. However, the replacement of the bad card revealed a second failing hard drive that would have been detected by our monitoring system had the original RAID card been working properly. This type of situation is extremely unusual and not one that we have experienced in the past.

This server runs several hard drives in a RAID 5 array, which allows for one single disk failure at a time. When two hard drives are non-functional at any point in time, it can cause system instability and data loss. In this case, the server failed to boot properly due to a corrupted file system and we were forced to initiate a filesystem check (fsck) to correct this corruption. Due to a combination of causes which included a failing hard drive, a recently replaced drive to finish rebuilding, and the legacy filesystem type of this class of server caused the fsck to take much longer than originally expected.

On Friday March 6th, our Architecture team made the decision to fail the server over to older backups. There were questions as to why we did not do this sooner. The decision was weighted based on how long the server might be down as opposed to how inconvenient the older data would be for customers on the server. Specifically, websites actively writing data to MySQL databases would be severely impacted, since MySQL data cannot be “merged.” With this in mind, and not having a reasonable ETA for how long the fsck would take, during which time the server is effectively offline, we ultimately decided to revert to backups on a temporary server.

On Monday, we were able to bring the physical ecbiz97 server back up with all original data intact. Though with the server largely unresponsive and sluggish, we were hesitant to replace the second failing drive. At this point in time, our Architecture team decided to mount the filesystem in the server’s “rescue” mode, and copy all data to a new server, which again is a time consuming process considering the state of the original ecbiz97 server.

Within the next 24-36 hours, we plan to power up the new server that contains the files that were in place on ecbiz97 as of Thursday 3/5 at approximately 2pm EST. Prior to us bringing the server online, customers on this server will receive an email that they may reply to if they wish to retain the data that is currently in use. You will have a choice for retaining the following:

  1. All databases that have been live for the last week
  2. All files that have been live for the last week
  3. Or both databases and files that have been live for the last week

If we receive no response we will restore your account to the state it was in on March 5th (This is the most common option for most users on this server). Any modifications made since March 5th at 2pm will need to be made again.

The temporary server will continue to be available for up to two weeks after the new server is online, in case any data needs to be retrieved. You will have access to this server if needed, though it should be noted that this server will no longer actively house your live website so no changes should be made to it. Our Support Department will be available to assist with any data migrations that may be needed. The information for both your old server and new server will be provided in a separate email when the transition is complete.

While we understand how this may inconvenice you, we believe this is the best case scenario when taking under consideration the entire population of the server as a whole, and the age of the data present on the temporary server.

We will also like to inform you that your new server boasts solid-state drives and additional hardware redundancy to provide protection against the type of failure that ecbiz97 endured. You can learn more about our SSD platform here:

https://www.inmotionhosting.com/ssd-hosting#ssd-hosting

Again, we are very sorry for the frustration and inconvenience caused by this unexpected hardware failure.

We want you to know that we appreciate you being a long-standing and loyal customer with us, and want to assure you we are as dedicated to you and your website as you have been to us. If you have any further questions or concerns please feel free to contact us, we’re available 24/7 via phone (888-321-4678 OPTION 2), chat, and email.

Thoughts on “Ecbiz97 Filesystem Incident

  • Kenard and ANGRYcustomer. From what I can tell, it’s now business as usual and nothing happened for the “good folks” at InMotion Hosting. Their behavior is such that you would have thought that they had their systems back up in 48 hours, had provided non-stop communication throughout the crisis and that our public and vocal annoyance was out of line. I, too, found myself blocked, having sent nearly ten messages – nothing vulgar, merely pointing out, among other things,  their weaknesses and failures, and suggesting that the problem was worse than they wanted to admit. I also found it amusing that technology is their first language and English their second in a response I received to the only post they approved – if you go back and read it (that is, if they approve this one), you’ll see that I was pointing out that they should do the right thing customer service-wise and reimburse us a portion of our contract fees for the downtime we experienced. Again, I said that it would be a good-faith gesture in recognition and recompense for what they put us through the last two weeks – we, who could quite well nail them on a class-action suit and considerably damage their reputation in the marketplace. And so it goes…

  • Was up and running for 24 hours, lost connection again, and kept getting “  SMTP error from remote mail server after MAIL FROM:<christine@christine-barker.com> SIZE=1722:

        host mx1.hotmail.com [65.54.188.126]: 550 SC-001 (BAY004-MC4F48) Unfortunately, messages from 198.46.81.5 weren’t sent. Please contact your Internet service provider since part of their network is on our block list. You can also refer your provider to http://mail.live.com/mail/troubleshooting.aspx#errors.”

     

    SOOOO Called, tech told me that Hotmail “grey listed” (not their server of course *rolls eyes* but a neighbor server) Then I come on here and see someone else has been grey/black listed from Comcast.

     

    Not only am I down again, but tech support is incompetent a best!! 

    Who is everyone jumping ship to? I need a good hosting company… 

    And… seriously… anyone else considering legal action? 

    • Hotmail server is rejecting all email as well. When I called tech support didnt even know about it and tried to tell me my email wasn’t working because they were copying files over. After finally refusing to get off the phone and verifying the issue through webmail (my only way to get email to work at all) they tried to tell me that their mail server was blocked because some else’s ip address that was close to theirs is blocked…. Then I come on here and see Comast is blocking their IP too…

      Who is everyone jumping ship to? I need a new hosing service?

      And seriously… is anyone looking into legal action? I am unable to connect anyone on hotmail or comcast (and who knows who else). Why wasn’t this checked before going live? The incompetence is overwhelming!!!

    • Oh, and my previous account has been blocked from posting comments on the support page… Hey guys… All I did was inform people that it only takes 3 people to start a class action law suit… no need to filter or mute people… your situation is already bad enough

  • As to Bruce’s comment above, will we be reimbursed/refunded for our downtime? I noticed that his mention of incurring a $3000 loss was neatly sidestepped. Now, while it may be difficult or impossible for your customers to quantify their losses, especially from potential business, you can certainly “make good” on your commitment to your customers by returning to us a portion of our yearly contract fees. It would be the kind of good-faith gesture that would help your reputation with your customers, both current and potential, and in the marketplace where said existing customers will be the most vocal about their experience with InMotion Hosting.

    • Good updates like this would have helped as this problem progressed! Your users were left in the dark with the same stale update posted for days.

    • Hello,

      I have forwarded your contact information to our customer service department which should contact you shortly. They handle any type of refund/compensation request. Please keep in mind that shared hosting is not suitable for high demand eCommerce websites as they should be on a server with more dedicated resources.

      Best Regards,
      TJ Edens

      Best Regards,
      TJ Edens

  • Live Support says they cannot do anything to help since there is zero access in the cpanel. There’s no way to “point” the emai elsewhere. Either that, or the guys I spoke to have no clue.

    • Hello Raj,

      Thank you for contacting us. I definitely understand your frustration, as it has been difficult for us as well.

      Please contact Live Support, so they can update your Primary email address to one that is not hosted on your server (we recommend doing this anyway). Then, we can resend the email regarding ecbiz97 to you.

      Upon response from you, we can continue restoring your account, which should bring back the files relating to your Addon domain.

      If you have any further questions, feel free to post them below.

      Thank you,
      John-Paul

  • Since my emails are not working, it’ll be impossible for me to receive this email. Do you have a solution in place for those of us who’ve had to deal without having functioning email for 1 week +?

    • Hello Ed,

      Thank you for contacting us. I have notified our Live Support team, that you have not received the email yet.

      They will be reaching out to you shortly to resend the email.

      If you have any further questions, feel free to post them below.

      Thank you,
      John-Paul

    • Hello reaj,

      Thank you for contacting us. I confirmed with our Live Support team, that email was back up and functional on your server before the notification was sent.

      You should be able to check and see the email at this time.

      If you have any further questions, feel free to post them below.

      Thank you,
      John-Paul

    • Hello Bruce,

      Thank you for contacting us. I agree, the downtime caused by this hardware failure was long and unacceptable. We appreciate you staying with us so many years. As you know, this is not something that happens often.

      I have forwarded your email address to our Live Support team, and will be included in the next round of emails.

      If you have any further questions, feel free to post them below.

      Thank you,
      John-Paul

    • Hello Richard,

      I removed your phone number as this is a public forum. You would just need to reply to the email you would have received stating you which option you would like transferred to the new server.

      Best Regards,
      TJ Edens

  • I lost $3000 in 2 days when my site was completely  down I been with inmotion  for over 7 years  and I think the recovery  time for the server was too long and unacceptable. 

    • Hello Ed,

      Can you please provide me with your primary domain so I can check your account and confirm whether or not you are effected by this outage? It will not be effecting all of our customers. Only those with accounts housed on the ecbiz97 server will be receiving an email. These customers will receive the email to the primary email address on file.

      Best Regard,
      TJ Edens

  •   What is the www. address of the website you are talking about?  I have different sites running on different company’s servers.

Leave a Reply