Skip to content


HSCLA29A Error During AIX Live Partition Mobility

AIX VIOS and HMC error messages sometimes leave a lot to be desired.  As an example, I received the following while validating that one of my partitions would successfully migrate between POWER 595 frames.

Google didn’t come up with too many hits, and the official IBM online documentation says “Contact your next level of support.”  Not exactly what I was looking for.

As it happens, this lpar recently had new LUNs assigned to it.  On a hunch, I ran ‘lsdev -vpd’ on the destination VIO servers and searched for the serial number of the new LUNs.  They didn’t exist.  A quick ‘cfgdev’ on each of the destination VIO servers fixed the problem.  Apparently, I forgot to do that when we added the storage.

To summarize: Error HSCLA29A appears to mean that LUNs are missing from the destination VIO servers.

Posted in AIX, Virtualization.

Tagged with , , .


Hanging up the Tech Blog

After months of telling myself, “I’ll update the tech blog soon”, I am finally admitting that I don’t enjoy owning a tech blog. And since I don’t enjoy it, it seems silly to keep it around as a testament to the fact that I don’t write very often.

I enjoy my job as an IT consultant. I enjoy presenting, teaching, and learning new things about servers, storage, and virtualization. I’ve discovered, however, that I don’t enjoy continuing my day job into the little bits of my day that aren’t consumed by work. Thus, the end of this failed writing experiment.

To my twitter followers…
First, thank you for following me. I am going to continue a twitter presence, but my tweets will likely drift away from things IT into things that interest me on a personal level. So, if you find tweets along the lines of “Sunny and hot in Louisville today, wish I were on the river.” to be an annoyance, you probably want to unfollow me. I’m not saying that I’ll never tweet IT again, but it won’t be a focus.

To the (dozen or so) people who have been reading this blog, again thank you. It will be going away soon if for no other reason than to save the $36/mo it costs in hosting fees.

To the real techbloggers out there, especially the independent ones, keep up the good work. I look forward to continuing to read your thoughts and ideas.

—–
Updated 2010-07-15: I’ve been prevailed upon not to remove this. I may even update it every now and then. Thanks for the kind words.

Posted in Other.


Is Your AIX Up to Date?

Nigel Griffiths (a name that should be familiar to all AIX administrators) has updated the AIXpert Blog with a reminder that new AIX service packs recently became available.  He’s recommending the following:

  • AIX 5.3:
    • TL09 SP7
    • TL10 SP4
    • TL11 SP4
    • TL12 SP1
  • AIX 6:
    • TL02 SP7
    • TL03 SP4
    • TL04 SP4
    • TL05 SP1

If you’re not on one of these levels, you probably should be.  While good conservative systems administration practices are a virtue, being too conservative is a vice.  IBM only supports a given technology level (TL) for around two years.  Being forced into an upgrade because fixes are no longer being created for your release is not a comfortable situation.

In general, I advocate running the latest SP of the N-1 TL.  This model has worked well for me.  It keeps my systems reasonably fresh while minimizing the upgrade risks.  I’ve also found that going from one AIX TL to another is almost always a painless upgrade.

For more information on AIX technology levels be sure to check out Fix Central and the Fix Level Recommendation Tool.

Posted in AIX, IBM.

Tagged with , .


Back in the Server World

My short hiatus away from blogging accidentally turned into a three month long sabbatical.  I have a new employer, a new set of responsibilities, and most importantly, a new focus.  I will no longer be focusing on storage.  Instead, I will be looking at servers and systems.  Storage will still be featured from time to time, but it will no longer be the primary topic of my writings.

In many ways, servers and systems are “home” for me.  I started my career on the mainframe, moved to *nix, dabbled in Windows, and only fairly recently was focused exclusively on storage.  I’ve spent time with OSF/1, HP-UX, AIX, and of course, Linux.  My HP experience is pre-Itanium, so it is quite dated.  My Linux experience dates back to kernel 0.99, and my AIX experience goes back to AIX/6000 v3.

Despite my employer being an IBM value added reseller, I am going to try to stay away from marketing and focus on technology.  My efforts at being a product cheerleader could be charitably described as “unfortunate”.  In my new job, I work for the services side of the house, so I am much more hands on with the technology and spend less time reading marketing glossies.  I am hoping that this approach will work better.

Posted in Other.


Moving On…

I’m preparing to change the focus of this blog.  There is a lot that can be said about enterprise storage, but for the most part, the topic is being well covered by other people.  I’ve come to the realization that while storage system design is a fascinating topic, storage is a means to an end and not the end in itself.  Please don’t misunderstand me.  I remain passionate about the DS8000 and its capabilities.  I know it better than any other device I have worked with in my 14 years as an IT professional.  It’s an amazing machine.  The DS8000 is only now beginning to realize the capabilities inherent in its design.  The best is yet to come.  IBM may not be able to out market it’s competitors, but we out engineer them at every turn.

I am not entirely certain what my new focus will be.  It will certainly be broader than just storage.  I am going to take a couple of weeks off to decide what direction to take.

I am also leaving IBM.

I will be back in a couple of weeks with a new focus, a new employer, and new topics to discuss.  I hope to see all of you then.

Posted in Other.


RAID in the 21st Century

Storagebod’s musings on storage availability got me thinking about RAID (Redundant Array of Independent Disks) technologies and how they have evolved to handle ever larger drive and array sizes.  RAID is, after all,  a risk mitigation technique.  Disk drives fail.  Sometimes this is a pure mechanical failure.  Other times, the drive media may develop bad sectors which render portions of the drive unreadable.  In either case, data has been lost that must be recovered via redundant data stored within the array.

Historically, the most popular RAID technology has been RAID-5 (striping with distributed parity).  RAID-5 performs well, has excellent storage efficiency, and is reliable enough for most common uses.  RAID-1(mirroring) and RAID-10 (mirroring and striping) are also common and have typically been used where RAID-5 does not perform well enough.  RAID-1/10 are also considered to be more resilient to failure than RAID-5.  This additional performance and resiliency comes at the expense of greatly reduced storage efficiency.

Recently, RAID-5 has been showing its age.  As drive sizes have become ever larger, the amount of time required to reconstruct the data from a failed drive has increased as well.  This has led to uncomfortably long periods of time where a single bad sector discovered during array reconstruction can wipe out an entire RAID array.  Statistically speaking, RAID-5 still seems to be working well for enterprise fiber channel drives, but I have become uncomfortable with RAID-5 arrays constructed from large SATA drives.  (I define large as 500+ GB.)  I expect my discomfort to increase as drive sizes continue to grow.

RAID-6 (striping with two independently calculated parity values) is one possible solution to the problem of data integrity exposure during array reconstruction.  With RAID-6, three things (as opposed to two with RAID-5) have to go wrong before data is lost.  This is dramatically more reliable than RAID-5, and still much more efficient than RAID-1.  RAID-6, however, is not a perfect solution.

The problem with RAID-6 is that most implementations are slow.  The additional I/O operations and parity calculations required by RAID-6 pose a significant performance penalty on write operations.  Clever implementations such as Intelligent Write Caching (available in DS8000 R4.2+), or the hybrid WAFL/RAID-DP approach taken by Data ONTAP (available in NetApp FAS and IBM N series arrays) significantly reduce the performance penalty of RAID-6.  In fact, DS8000 Intelligent Write Caching makes RAID-6 arrays on the DS8000 perform almost as well as pre-Intelligent Write Caching RAID-5 arrays.

So what about XIV?  XIV uses a completely different storage scheme named RAID-X.  RAID-X is a radical re-think of the way we store data in an array.  It is a hybrid of mirroring, massive parallelism, and dynamic balancing of system resources.  RAID-X’s goals are simple: make commodity level hardware perform at enterprise levels, make storage administration dramatically simpler, provide consistent performance in the face of wildly varying I/O requirements, and be able to seamlessly adapt to ever increasing hard drive sizes.

There’s a lot of misinformation about how RAID-X works.  In an attempt to clear matters up, I offer the following.

A fully populated XIV frame contains 15 modules.  Each module contains 12 disk drives, cache, and processor resources.  This gives us a total of 180 disk drives.  Today’s shipping XIVs use 1 TB SATA disk drives, so the raw capacity of the frame is approximately 180 TB.  (In reality, it is a bit smaller since a 1 TB drive doesn’t actually hold 1 TB of data, but that is a topic for another post.)  All data is mirrored internally and some space is set aside for spare capacity and system metadata.  This gives a usable capacity of 79 TB per fully populated XIV frame.

When data comes in to XIV, it is divided into 1 MB “partitions”.  Each partition is allocated to a drive using a pseudo-random distribution algorithm.  A duplicate copy of each partition is written to another drive with the requirement that the copy not reside within the same module as the original.  This protects against a module failure.  A global distribution table tracks the location of each partition and its associated duplicate.  When a failure occurs, the system knows exactly which partitions are no longer protected and immediately begins creating new copies to restore redundancy.  This is where the parallelism of the design comes into play.  The entire machine goes to work re-creating the missing redundancy, so very little work has to be done by any one component.  This allows XIV to rebuild a failed 1 TB drive in minutes as opposed to the hours it would take in traditional RAID implementations.

The most common FUD point raised against RAID-X is that it is vulnerable to a double-drive failure and since data is spread across the entire machine, the failure of any two drives will cause data loss.  While this makes a great talking point on a competitive slide, it is simply not the case.  Allow me to explain.

As I pointed out above, RAID is a risk mitigation technique.  The most common ways to mitigate the risk of data loss are to decrease the probability that a critical failure combination can occur, and/or decrease the window of time where there is insufficient redundancy to protect against a second failure.  RAID-6 takes the former approach.  RAID-10 and RAID-X take the combination approach.  Both RAID-10 and RAID-X reduce the probability that a critical combination of failures will occur by keeping two copies of each bit of data.  Unlike RAID-5, where the failure of any two drives in the array can cause data loss, RAID-10 and RAID-X require that a specific pair of drives fail.  In the case of RAID-X, there is a much larger population of drives than a typical RAID-10 array can handle, so the probability of having the specific pair of failures required to lose data is even lower than RAID-10.

Another differentiation between RAID-X and RAID-10 is the window of vulnerability between the time a drive fails, and the time when full redundancy is restored.  RAID-10 has to copy the entire contents of the surviving member of the pair to a spare drive.  The copy process is directly proportional to the size of the drive since only one source volume is copying to one target volume.  While much faster than RAID-5, it can still take a while to copy a 1 TB drive.  The limiting factor is the transfer rate supported by the single source and target volume pair.

RAID-X operates differently.  When a drive fails, the global distribution table immediately knows the locations of all data that are no longer redundant.  This non-redundant data is evenly distributed among 168 drives.  The system immediately goes to work creating redundant copies of all exposed data.  This is done as a fully parallel, any-to-any operation with all surviving 179 drives participating.  Each drive only has to carry out 0.6% of the effort required to restore redundancy.  In addition, and this is key, only partitions that actually contain user data need to be copied.  If the system is only 50% full, RAID-X only needs to copy 500 GB worth of data to fully recover from a failed drive.  Contrast this to RAID-10,  which has to copy the entire drive regardless of the amount of user data actually stored on the drive.  Between the inherent parallelism of the design and the intelligence of the copy process, RAID-X can completely recover from a failed 1 TB drive in as little as 15 minutes.

I hope this sheds some light on how XIV’s RAID-X really works.  As with many new and creative approaches to old problems, there are a lot of misunderstandings, misinformation, and outright FUD in the marketplace concerning RAID-X.  I firmly believe that RAID-X is at least as reliable as any other mirroring technology and has further advantages, not all of which I have been able to include here.

Thank you for reading this rather lengthy post.  I look forward to continuing the conversation.

Posted in IBM, Storage.

Tagged with , , , , , .


Happy Holidays!

May you aPolka-dot dressnd yours have a joyous holiday season!  I’ll be back in early 2010.
Creative Commons License photo credit: kodamatic

Posted in Other.


Another Take on ‘Going to 11′

I’m not going to attempt to take credit for starting a resurgence of the Spinal Tapour amps go to 11” meme, but since I referenced it in Inside the DS8700 Part 2, I’ve seen it in quite a few new places.  Perhaps I’m just sensitized to it now.  The latest place I’ve seen it is on Randall Munroe’s brilliant xkcd webcomic.  (Fair warning, Randall’s sense of humor is a little unusual and some of his comics may be considered by some to be unsafe for work.)

Click to be taken to the punchline

Click to be taken to the punchline

I almost never fail to be amused by Randall’s work, but this one almost caused me to choke on my coffee!  It’s very, VERY, good.  It’s also all too true in the technology industry.  For those of you who aren’t familiar with xkcd, I hope you enjoy it as much as I do.

Posted in Amusing.

Tagged with , .


Inside the DS8700 Part 3 – Summary and Wrap-up

storage_disk_images_ds8700_79x172I’ve spent the past few weeks describing some of the technological underpinnings of IBM’s new DS8700.  In Part 1, I discussed the new POWER6 based processor complexes.  In Part 2, I examined the move from RIO-G buses to a PCI Express fabric.   Today, I am going to wrap up this series with a summary and some odds-and-ends that didn’t fit under any of the other topics.

Being Green

To get started, let’s talk about being green.  As companies have realized that being green (environmental) can directly translate to having more green ($$$), IT departments have come under scrutiny.  Let’s face it.  IT is a power-hungry activity.  After all, it’s no coincidence that more and more datacenters are being built next to power plants.  Customers have begun looking at metrics like work per watt, capacity per watt, and other measurements of power efficiency.  It’s no longer enough to be fast.  Efficiency is also a requirement.

The DS8700 takes advantage of the energy efficient design of the POWER6 processor to deliver highly efficient performance.  As an example, the DS8700 is capable of delivering 10 IOPS/watt using traditional spinning disk.  This is over 50% more IOPS/watt than the DS8300, which was already quite efficient.  Install solid state drives into the DS8700 and this number jumps even higher.  This makes the DS8700 an attractive consolidation vehicle for older, less energy efficient storage devices.  Going green to save green couldn’t be easier.

New Management Interface

As with prior releases, the R5 microcode includes enhancements to the DS8000 management GUI.  The DS8000 line has always been a customer configurable device.  There never has been a requirement to contract a vendor engineer to come and configure your device for you.  Starting in R3, IBM began a re-work of the GUI to make the configuration process faster and more intuitive.  The R5 GUI contains new visualizations that make it easier to see the relationships between logical constructs and the underlying physical hardware.  It also contains a new real-time performance graph to help storage administrators see what is going on under the covers of the machine.

DS8000 R5 Real-time Performance View

DS8000 R5 Real-time Performance View

DS8000 R5 Hardware Visualizer

DS8000 R5 Hardware Visualizer

Summary

To summarize, I’m going to quote from IBM’s DS8700 announcement presentation.

The DS8700 announcement introduces the most advanced model in IBM’s DS8000 lineup with up to over a 150% boost in performance.  This new hardware refresh not only offers much higher performance, it also builds on the DS8000′s unrivaled reputation for reliability and investment protection by maintaining its IBM POWER-based architecture over generations of new models.

This release underscores the commitment to our flagship enterprise disk platform and enables us to continue providing an ideal combination of optimized performance, scalability, reliability, and value that our most demanding customers expect from IBM.

That sums it up rather nicely, doesn’t it?

Posted in IBM, Storage.

Tagged with , , , , , .


Inside the DS8700 Part 2 – PCI Express

ds8-pcieWelcome back to this behind-the-scenes look at the DS8700.  In Part 1, I examined the POWER6 based processor complexes that form the heart of the DS8700.  Today, I’m looking at the PCI Express gen2 I/O fabric that makes up the backbone of the most advanced version of IBM’s flagship enterprise disk product.

There are many design decisions made during the development of a storage subsystem.  One of the most fundamental is the interconnect topology used to connect all the components in the machine.  The DS8100 and DS8300 use a high-speed bus known as a RIO-G loop to connect the processor complexes and the PCI-X based I/O towers.  This has been a very successful design (as 1000s of our customers can attest), but for the DS8700 we wanted more.  To borrow a phrase from Spinal Tap, we wanted the DS8700 to go to ‘11‘.

We still have a RIO-G loop in the DS8700, but it is only used to connect the two POWER6 processor complexes together for synchronization and control purposes.  The big change is in how we connect the processor complexes to the I/O towers that contain the back-end disk adapters and the front-end host adapters.  For these connections, the DS8700 has replaced the RIO-G loops with a fabric of point-to-point PCI Express connections.  Each I/O tower has a dedicated 2 GB/s connection to each of the processor complexes.  This translates into a significant increase in the amount of data throughput we can sustain with the DS8700.

Making the move to PCI Express has brought more than increased performance to the DS8700.  It has also allowed us to further raise the bar in terms of reliability.  PCI Express adapters are intelligent devices.  Transient bit or CRC errors that can freeze other I/O technologies are caught in the PCI Express adapter and handled by the adapter itself.  Persistent errors can be dealt with by gracefully degrading the data transmission speed and notifying the processor complex of the problem.  By using smarter adapters that can self-heal, we add another layer of reliability to an already highly available system.

In my next article on DS8700 internals, I’ll be stepping away from the hardware and taking a look at some of the microcode enhancements in the R5 code that powers the DS8700.

Update 1:  Added link to YouTube clip that illustrates taking things to ’11′.  Thanks David!

Posted in IBM, Storage.

Tagged with , , , .