Disaster Recovery: A Tape Survival Story

Computerworld is running a feature on how Estes Express Lines survived Hurricane Gaston dumping four feet of water into their data center.  Dick Cosby, Estes’s systems administrator, is quoted as saying “You are out of your mind if you think you can live without tape.  It makes zero sense to put up an all-SAN solution with data de-duplication. It is very expensive and not nearly as reliable.

I had dinner with Dick at an IBM event a year ago.  We spent most of dinner talking about how Estes had recovered from the hurricane.  I was extremely impressed by Estes’s ability to fully recover even though the magnitude of the disaster outstripped anything they had planned for.  These guys have earned the right to make observations about what does and doesn’t make sense in a disaster recovery/business continuance strategy.  Unlike most of us in the vendor community, Estes has been there, done that, and lived to tell about it.



Extreme Data Recovery: Salvaging Data from the Space Shuttle Columbia

I clearly remember the morning of February 1st, 2003 when I heard that the Space Shuttle Columbia had been destroyed over the central United States. It was a vivid reminder of the dangers of space exploration. In addition to the loss of life, all of the data from the on-board experiments was also presumed lost. Now, five years later, one of the experiments has been completed thanks to a remarkable piece of data recovery.

The CVX-2 experiment stored its data on a 400MB Seagate hard drive. The drive was recovered from the debris, and although badly damaged by re-entry and impact, engineers at Kroll Ontrack were able to recover 90% of the data on the drive. This allowed the CVX-2 researchers to complete their experiment and bring a 20-year research effort to conclusion.

Hat’s off to the engineers at Kroll Ontrack. This is a feat on par with IBM’s recovery of the Space Shuttle Challenger’s flight data recorder tapes.

Source: Blocks and Files (includes a photo of the hard drive)



In the Beginning, There Was SCSI…

Any discussion of modern storage protocols, must begin with SCSI. From its humble beginnings in the 1980s, SCSI is nearly ubiquitous as the underlying technology used by most major storage protocols today. SCSI is an official standard architecture and is overseen by the T10 committee of the InterNational Committee for Information Technology Standards.

SCSI is both a physical cabling specification and a command set. For the purposes of this discussion, I will only be referring to SCSI’s logical definition. It’s been a long time since I’ve worried about SCSI cable lengths, wide, ultra-wide, HVD, LVD, etc. and I don’t plan to start now. Those days are gone.

The SCSI command protocol lives on however. At it’s core, SCSI is a client-server protocol with some special terminology. Here are some common SCSI concepts that are useful to know:

  • Initiator - The device which sends (initiates) the SCSI command. This is usually a disk adapter card in the computer system.
  • Target - The device which receives and processes the SCSI command. This is generally a storage device (i.e. disk or tape).
  • Bus - A physical or logical connection between a collection of initiators and targets. A bus typically contains a single initiator and many targets, but multi-initiator buses are allowed.
  • Logical Unit (LU) - The logical representation of a storage device. A logical unit may be a single SCSI hard drive, or as is common in large storage arrays, a logical unit may be a subdivision of the larger array. (It’s possible for a large storage array to contain hundreds of logical units.)
  • Logical Unit Number (LUN) -A numerical given to an individual logical unit that uniquely identifies it on a given bus. The term ‘LUN’ is commonly used to refer to the logical unit itself, but this usage is technically incorrect. LUNs are used to identify the source and destination for SCSI commands.

I’m not going to cover the specifics of the commands. For that, please visit the T10 site or Wikipedia. My intent for this installment was to provide these basic definitions and plant the idea that SCSI is the foundation for many of the protocols that we’ll be discussing in future installments.

Related Entries:



Storage Networking: How Many Protocols Do We Need?

There seems to be an ever increasing number of storage protocols entering the marketplace. Fibre Channel, iSCSI, FC-IP, FCoE, SATA, SAS, and others are available for use, but making sense of the choices can be confusing. Over the next couple of weeks, I’ll spend some time explaining and analyzing the different protocols in an effort to differentiate and position them. Each protocol has its own strengths and weaknesses, and no matter what your favorite vendor tells you, there is no single perfect protocol.

Related Entries:



Who Backs Up the Data in Web 2.0?

Shifting resources from the desktop to the web is the hallmark of Web 2.0. I don’t need an office software suite installed on every system I use. I can connect to Google Docs from my Windows desktop, my MacBook, or even my EeePC and effortlessly edit the same document using the same interface across each platform. Companies have moved their entire customer relationship management systems to the web using services such as salesforce.com. Using computing services on the web reduces the need for individuals or companies to manage their own information technology needs. I, myself, am a frequent user of services like Google, Remember The Milk, Jott, and Sandy.

Along with online applications, storage has also been making its way into the cloud. While the concept of a storage utility isn’t new, only recently has the cost of storage and bandwidth come down enough for internet based storage to become a consumer product. The idea of effortlessly storing data somewhere on the internet is attractive. I just did a home computing inventory and discovered that I have nearly 2TB of disk. Of that 2TB, more than half of it is tied up as backup drives. While I am able to survive the failure of any single drive, if I lose my house due to fire or natural disaster, I’ll most likely lose all my data. Applications like JungleDisk, Box.net, and the .mac iDisk, all provide a simple means for off-site storage. (I’ve been looking at JungleDisk.)

An interesting thing has started to happen with these online storage providers. Instead of using them as a place to store an off-site backup, users are starting to use them as primary storage. This makes me nervous. Likewise, all of the web applications I mentioned above use their own storage devices to hold their user’s data. This also makes me nervous. Privacy and security issues aside, who backs it up? If my only copy of my data lives inside of one of these online storage providers, what happens when that data becomes inaccessible?

Unfortunately, this scenario seems to be playing out. As covered by Web Worker Daily, and ReadWriteWeb, Omnidrive seems to have suddenly closed its doors, taking its user’s data with it. This is a scary scenario for anyone who makes extensive use on online services. Unfortunately, this scenario shouldn’t surprise anyone. Businesses, especially ones offering very cheap (or free) services on the web often close their doors.

My advice to everyone is to take a hard look at data you are storing on other people’s servers.  Carefully evaluate what you are storing online, and create a contingency plan for use when that online resource (for whatever reason) isn’t available.  If it’s important, back it up somewhere you control.  And keep that backup updated.  Using Web 2.0 applications is fine, just remember that only you are responsible for your data.  Putting it on the web in someone else’s data center, doesn’t make it their problem.



Another View of the Fibre Channel vs. iSCSI Protocol War

Jon Toigo at Enterprise Systems has posted his predictions for storage in 2008. He has an excellent alternative view of the fibre channel vs. iSCSI protocol battle. Despite the fact that I recently panned iSCSI, Jon raises interesting and valid points about fibre channel’s shortcomings, especially in the area of interoperability.

“Unlike FC switches, where standards have been developed to ensure that vendors can make their switches non-interoperable with one another even if they fully comply with the letter of ANSI standards, iSCSI switches (or rather IP switches) will work and play together.”

I, personally, as well as several of my customers, have been bitten by SAN interoperability issues. I’ve even griped on occasion that FC SANs should be as easy to assemble as IP networks. I just hadn’t extended that thought to iSCSI. Thanks, Jon, for opening my eyes to this one.

Interoperability issues aside, I still see fibre channel remaining the dominant storage connection protocol in 2008. Perhaps 2009 will finally be the year of iSCSI.



IDC Issues Top 10 Storage Predictions for 2008

IDC has published its top 10 storage predictions for 2008. Here’s my take.

1. Storage services models for data backup, archiving and replication will be more appealing to businesses.

I agree completely. More and more companies, especially smaller ones, are realizing that it doesn’t make sense for them to constantly be building ever larger storage infrastructures. The concept of a “storage utility” has been bouncing around since at least 2001, but it has only recently started to take off. Services like Amazon’s S3 and the oft-rumored G-drive are only the beginning.

2. New role-based storage systems will demand tighter integration between the storage layer and content-generating applications.

This one also makes sense. In a business context, storage isn’t important. It’s the data contained within the storage that is valuable. Unfortunately, our ability to categorize and contextualize data has not kept up with our ability to generate it. One of the on-going challenges of data management is to separate the valuable data from the background noise. Making the storage layer more aware of the contents of the data is a logical step.

3. Vendors will build object-based storage systems to classify data and add policies closer to the point of creation.

This is an extension of the above. Today many companies faced with data retention regulations adopt a “keep everything” policy. This leads to multiple challenges beyond the obvious cost of storage. By having a mechanism to automatically classify data as it is created, policy can be enforced in an autonomic manner. For example, the storage system will know that my quarter-end financial spreadsheet needs to be kept for some period of time, while my email asking my coworkers where they want to go to lunch can be discarded immediately.

4. Falling prices of solid-state disk drives will push mainstream adoption.

I’m not as optimistic on this point. I think that a lot more solid state storage will be sold in 2008, but I don’t think that 2008 will be known as the year of the solid state drive. I look for it in laptops and blade servers in 2008. The price point is just going to be too high for use in enterprise storage. Perhaps 2009.

5. Virtual servers will become an ideal conduit for iSCSI.

iSCSI has been predicted to take over the storage networking world since 2004. Every year it gets a little better and gains a little more market penetration, but fibre channel still prevails.

Don’t get me wrong, I like iSCSI. I also think that it is a natural fit with the virtual server ecosystem. The challenge is going to be one of speed. 4Gbit fibre channel is common and 8Gbit is just around the corner. Gigabit ethernet is common, but 10Gbit ethernet is still a bit unusual. Once you start stacking a lot of virtual server images onto your hardware, you’re going to start needing that extra bandwidth. I predict that iSCSI will ultimately be king in the smaller shops, but fibre channel will continue to reign in the enterprise.

6. Value-added storage services will become nontethered from storage infrastructure.

I hope this one comes true. I’m tired of vendors deliberately ignoring open storage standards in an effort to protect their install base. Aperi is a step in the right direction.

7. Full-disk encryption will be prevalent in the data center to satisfy compliance and safe harbor provision rules.

This one is a no-brainer. 2007 was the year of tape encryption. 2008 will absolutely be the year for full-disk encryption. The interesting challenge will be managing the encryption keys. That is ultimately much more difficult than scrambling the bits on the disks.

8. Offerings designed for small and midsize businesses featuring integrated storage and server technology will flood the storage market.

This is already happening.

9. Green storage initiatives will cause companies to seek nondisruptive/partial hardware upgrades.

I’m going to be honest. I don’t get this one. Can anyone help clue me in?

10. De-duplication, thin provisioning and virtual tape libraries will be in demand because of power saving efforts in the data center.

Ah, the buzzword trifecta of de-duplication, thin provisioning, and virtual tape. While legitimate arguments can be made for the power savings generated by storing less data, I don’t agree that virtual tape fits here. Think about it. What consumes more electricity, a virtual tape spinning around on a continuously rotating disk or a tape cartridge sitting quietly in a library? Tape isn’t dead. If anything, tape will become more important as an integral part of the green datacenter.

So, that’s my take on IDC’s 2008 storage predictions. Feel free to post your own comments or predictions. I’ll look forward to revisiting this post next December.

(via Computerworld)



Turning the SAN ‘Inside-Out’?

An article on eWeek caught my eye this morning. Seanodes, a Paris-based startup, has announced a product which, as they put it, “allows users to share disks and arrays directly attached to, or embedded in servers as if they were part of an external array.” The idea is that just as you use server virtualization to better utilize excess server processor resources, the Seanodes system will allow you to aggregate and better utilize excess directly attached storage.

In general, I’m not a fan of putting a large amount of storage into a server. Typically, I only use internal hard drives to boot the operating system. I put application data on some sort of centralized storage array. I’ve found that while the acquisition costs of external storage arrays are often higher than using internal storage, the flexibility afforded by decoupling storage from the server is invaluable. Lately, I’ve been experimenting with even moving the boot media off the server.

But let’s go back to the Seanodes Exanodes product. At this point it appears to be supported on Linux only and is aimed at cluster computing. Unlike clustered filesystems like IBM’s GPFS or RedHat’s GFS, which allow multiple computers to access the same files, Exanodes works at the block level. The software aggregates bits of storage from all participating nodes, and provides a synthetic hard drive to a using system. This means that while the hard drives may be shared across multiple systems, the data contained within the synthetic hard drive is not shared. While storage capacity is aggregated and distributed, the data still belongs to a single server. It’s just potentially scattered across several servers.

Seanodes has built availability into the system. Every block of data stored inside of Exanodes is replicated across at least two participating nodes. Seanodes calls this RAIN which expands to “Redundant Array of Independent Nodes”. While this insulates data access from a failed or powered-down node, it also cuts available storage capacity by at least half. The redundancy overhead is even higher if you use RAID arrays as the underlying physical storage.

Seanodes has an interesting concept. I can see a fit for it in the commodity Linux cluster space. I’m unconvinced that it is useful in a more general computing environment. Centralized storage is becoming cheaper by the day and decoupling data from the server is the very heart of virtualization.

(Disclaimer: I am basing my analysis on the information published on Seanodes’ website. I have requested their whitepaper, but have not yet received it.)