Posts Tagged ‘Documentation’

Opt in for /opt partitioning

January 13th, 2009

I’m currently reading chapters of the VMware Infrastructure 3 Advanced Technical Design Guide in my spare time and I came across their recommended ESX partitioning strategy.  I’d like to think I’ve got a pretty good handle on ESX partitioning.  I’m quite comfortable with it, and thus normally I would breeze quickly through partitioning documentation, verifying along the way that my partitioning was still on par.

My partitioning scheme was still looking good until I came across a new recommendation of creating a dedicated /opt partition.  This is something I hadn’t done before.  Under normal circumstances without a dedicated /opt partition, /opt is going to be a directory off / (root).  Why is this not the greatest idea?  I was enlightened by the fact that some VMware HA logging as well as some hardware agent logging is stored in /opt.  There are enough posts on the VMTN forums describing situations where excessive logging on /opt chewed up all available partition space on / to warrant proactive measures.  As we know, running out of disk space on / is less than ideal and that’s putting it mildly.

Solution:  When building the ESX host, create a dedicated partition for /opt.

Now I’ll be honest, I’ve never run into a situation of excessive logging on /opt, but this is one of those strategies that falls into the category of “an ounce of prevention is worth a pound of cure”.  Learn from the experiences of other VI administrators who I’m sure suffered some downtime as a result.  Don’t wait for this happen to you when it doesn’t need to.  The ESX host and its health is critical for datacenter and production operations.  When the ESX host isn’t happy, the VMs running on it usually aren’t happy either.

That said, here is my updated partitioning scheme for ESX 3.5.0.

Create the following partitions in the following order:

Mount Point Type Size Primary? Notes
/boot ext3 250MB Yes The default from VMware is 97MB.  When we migrated from ESX 2.x to ESX 3.x the partition size grew from 50MB to nearly 100MB.  I came up with 250MB to leave breathing room for future versions of ESX which may need an even larger /boot partition.  This is all a moot point because I don’t do in place upgrades.  I rebuild with new versions.
<swap> 1600MB Yes Twice the maximum amount of allocatable service console memory.  My COS memory allocation is 500MB but if I ever increase COS memory to the 800MB max in the future, I’ve already got enough swap for it without having to rebuild the box to repartition.
/ ext3 4096MB Yes The default from VMware is 3.7GB.  We want plenty of space for this mount point so that we do not suffer the serious consequences of running out.
/home ext3 4096MB Not really needed anymore except as home directory storage for local and the default from VMware is that this partition no longer is created.  For me this is just a carryover from the old ESX days.  And disk space is fairly cheap (unless booting from SAN).  I’ll put this and other custom partitioning out to pasture when I convert to ESXi where we are force fed VMware’s recommended partitioning.
/tmp ext3 4096MB The default from VMware is that it doesn’t exist, rather it creates the /tmp folder under the / mount point.  This is not a great idea.  VMware uses a small portion of /tmp for the installation of the VirtualCenter agent but my philosophy is we should have plenty of sandbox space in /tmp for the unpacking/untarring of 3rd party utils such as HP Systems Insight Manager agents.
/var ext3 4096MB The default from VMware is 1.1GB and additionally VMware makes the mount point /var/log isolating this partition strictly for VMware logs.  We want plenty of space for this mount point so that we do not suffer the serious consequences of running out.  In addition, we want this to be a separate mount point so as not to risk the / mount point by consuming its file system space.  VMware logs and other goodies are stored on this mount point.
/opt ext3 4096 The default from VMware is that it doesn’t exist, rather it creates the /opt folder under the /mount point.  We want enough space for this mount point so that we do not suffer the consequences of running out.  In addition, we want this to be a separate mount point so as not to risk the / mount point by consuming its file system space.  VMware HA logging and sometimes hardware agent logging are stored on this mount point.  This is a VI3 ATDG recommendation.
<vmkcore> 110MB The default from VMware is 100MB.  I got the 110MB recommendation from Ron Oglesby in his RapidApp ESX Server 3.0 Quick Start Guide (this book is a gem by the way; my copy is worn down to the nub).  Although I asked Ron, he never explained to me where he came up with 110MB but let’s just assume the extra 10MB is cushion “just in case”.  This is the VMKernel core dump partition.  The best case scenario is you rarely if ever have a use for this partition although it is required by ESX whether it’s used by a purple screen of death (PSOD) or not.
Leave the remaining space unpartitioned.  This can be partitioned as VMFS-3 later using VirtualCenter for maximum block alignment.

Datacenters need shutdown/startup order

January 1st, 2009

Today I learned of a new blog called Virtual RJ which is owned by Robbert Jan van de Velde (yet another Dutch VMware virtualization enthusiast!).  I was reading an article he had recently written called Making inactive storage active in VirtualCenter.  What hits close to home for me about this article is the need for datacenter playbooks which outline a shutdown/startup order of infrastructure and servers.  Once upon a time, our environment was fairly simple and staff was small.  Although our environment was documented, the need for a formal shutdown/startup order was not so prevalent.  Over the years, staff has grown, new applications have been introduced to the environment, and the number of servers grew into the hundreds.  Not to mention, storage got out of control and with that we brought in SAN infrastructures.

Unless your datacenter is the size of a broom closet, chances are you cannot easily get away with throwing the master power switch to bring up infrastructure and servers in the right order.  Obviously you’re not going to use a power switch to shut everything down ungracefully either, but what may not be so obvious is that a graceful shutdown or startup of servers and infrastructure in random order may not be the best choice considering the health of the environment.

In order to understand the correct shutdown/startup order for your environment, you need to fully understand the web of datacenter dependencies which can range from simple to highly complex.  Knowing your datacenter dependencies means having good documentation of its components:  servers (including clusters), applications, storage, authentication, network, power, cooling, etc.  Virtualization adds a layer as well as I will show in a moment.  Let’s look at a few high level examples of dependencies:

  • Users depend on applications, workstations, network, VDI, etc.
  • Applications depend on databases, network, authentication, storage, other applications, etc.
  • Highly available databases depend on shared storage, clustered servers, etc.
  • Clustered servers depend on shared storage, authentication, network, quorum, etc.
  • Shared storage and network depends on power and cooling.
  • Consolidated virtual infrastructures (including VDI) depend on everything.

The list above may not completely fit your environment, but it should start to get you thinking about what and where the dependencies are in your environment.  Let me re-emphasize that without knowledge of how data flows in your environment, you won’t be able to come up with an accurate dependency tree.  Shutdown and startup orders aside, you’re in a scary position.  Start documenting quickly.  Talk to your peers, developers, managers, etc. to tie your datacenter components together.

So what does the dependency list above mean and how does it translate into a shutdown/startup order?  Well, workstations and VDIs typically have no dependencies and can be shut down first.  Application servers (including VMs) can be shut down next (except for the vCenter server – we’ll need that to shut down VMs and hosts).  Database cluster shutdown follows with the caveat that not all cluster nodes should be shut down at the same time – stagger the shutdown so as not to hang quorum arbitration risking potential corruption of data.  At this point, if all VMs are shut down, we can use vCenter shut down all ESX/ESXi hosts and then the vCenter server.  At this point, authentication should no longer be needed so let’s shut down the domain controllers.  Getting to the end of the list, we can shut down shared storage, SAN switches, and networking equipment (in that order).  Lastly, we pull the plug on phone systems, Twitter, cooling, and then sever the link to street power.  No really, just kidding – Twitter is not that much of a dependency.  I can quit Twitter any time I want.

Now that we know shutdown order, startup order is typically simple – startup order is the reverse or inverse of the shutdown order.  Example:  Throw the switch for street power.  Engage cooling.  Turn on the PBX.  Fire up the network switches and routers.  SAN switches (go grab a coffee) then shared storage.  Domain controllers, ESX hosts, vCenter, app servers, blah blah blah.  You get the idea.

Everyone on your staff has both lists above memorized right?  If not, you need to get it documented in a shutdown/startup playbook.  I don’t feel one needs complex software or hired technical writers to put this together.  If you understand the dependencies, 85% of the work is already done.  My solution for what I put together was embarrassingly simple:  Microsoft Excel.

The tool itself doesn’t need to be incredibly complex, however, that doesn’t mean your shutdown/startup order will be as simple.  In the spreadsheet I maintain for my environment, I have a few hundred rows of information and many columns representing branch dependencies.  I also have a few different tabs in the spreadsheet with slightly different orders.  This is because we have multiple SANs and if we’re only shutting down one of the SANs for planned maintenance, we only need to shut down its dependencies and not the entire datacenter including the other SANs.

Like many other types of documentation, the shutdown/startup order should be considered a living/breating document that needs periodic care and feeding.  When new servers, infrastructure, or applications are brought into the environment, this document needs to be updated to remain current.  When datacenter components are removed, again, a document update is needed.  We’ve got a formal server turnover checklist which catches loose ends like this.  Any server that goes into production must have all the items on its checklist completed first (ie. all documentation complete, added to backup schedule, added to server security plan, etc.)  Likewise, we also maintain a formal server retirement checklist to make sure we’re not trying to back up retired servers or consume static IP addresses of retired servers.

As our team becomes more distributed and expertise is honed to specific areas of the organization, it is important that all staff members resopnsible for the environment understand the requirements to shut it down quickly or in a planned fashion.  That means good documentation.  Better documentation also means your peers have the tools needed to do your job while you’re gone and less chance you’ll be called in the middle of the night or while on vacation.

Introducing: IT Knowledge Exchange/TechTarget

December 18th, 2008

Have you seen TechTarget’s IT Knowledge Exchange? If you are an IT staff member in search of answers or excellent technical blogs, ITKE is one site you’ll want to bookmark. Their award winning editorial staff include virtualization bloggers such as Eric Siebert, David Davis, prolific VirtualCenter plugin writer Andrew Kutz, Rick Vanover, Edward Haletky, and many more.

Search or browse by hundreds of tags covering hot IT topics such as Database, Exchange, Lotus Domino, Microsoft Windows, Security, Virtualization, etc.

Their value proposition is simple: provide IT professionals and executives with the information they need to perform their jobs—from developing strategy, to making cost-effective IT purchase decisions and managing their organizations’ IT projects.

One month ago, brianmadden.com was purchased by TechTarget. I think this addition will be a nice shot in the arm for ITKE. In one transaction they integrate an established rich Citrix/Terminal Services/Virtualization knowledgebase and talented staff of bloggers that it can in turn use to help its readers and advertising clientele.

TechTarget has over 600 employees, was founded in 1999, and went public in May 2007 via a $100M IPO.

12-18-2008 8-27-33 AM

MEPS (my ESX partitioning scheme)

December 15th, 2008

Here is a topic that has been discussed in great depth on the VMTN forums over the years but Roger Lund has asked me if I would post my ESX partitioning scheme.  Here it is, with a bit of my reasoning which I’ve learned along the way.  Enjoy!

Create the following partitions in the following order:

Mount Point Type Size Primary? Notes
/boot ext3 250MB Yes The default from VMware is 97MB.  When we migrated from ESX 2.x to ESX 3.x the partition size grew from 50MB to nearly 100MB.  I came up with 250MB to leave breathing room for future versions of ESX which may need an even larger /boot partition.  This is all a moot point because I don’t do in place upgrades.  I rebuild with new versions.
<swap> 1600MB Yes Twice the maximum amount of allocatable service console memory.  My COS memory allocation is 500MB but if I ever increase COS memory to the 800MB max in the future, I’ve already got enough swap for it without having to rebuild the box to repartition.
/ ext3 4096MB Yes The default from VMware is 3.7GB.  We want plenty of space for this mount point so that we do not suffer the serious consequences of running out.
/home ext3 4096MB Not really needed anymore and the default from VMware is that this partition no longer is created.  For me this is just a carryover from the old ESX days.  And disk space is fairly cheap (unless booting from SAN).  I’ll put this and other custom partitioning out to pasture when I convert to ESXi where we are force fed VMware’s recommended partitioning.
/tmp ext3 4096MB The default from VMware is that it doesn’t exist, rather it creates the /tmp folder under the / mount point.  This is not a great idea.  VMware uses a small portion of /tmp for the installation of the VirtualCenter agent but my philosophy is we should have plenty of sandbox space in /tmp for the unpacking/untarring of 3rd party utils such as HP Systems Insight Manager agents.
/var ext3 4096MB The default from VMware is 1.1GB and additionally VMware makes the mount point /var/log isolating this partition strictly for VMware logs.  We want plenty of space for this mount point so that we do not suffer the serious consequences of running out.  In addition, we want this to be a separate mount point so as not to risk the / mount point by consuming its file system space.  VMware logs and other goodies are stored on this mount point.
<vmkcore> 110MB The default from VMware is 100MB.  I got the 110MB recommendation from Ron Oglesby in his RapidApp ESX Server 3.0 Quick Start Guide (this book is a gem by the way; my copy is worn down to the nub).  Although I asked Ron, he never explained to me where he came up with 110MB but let’s just assume the extra 10MB is cushion “just in case”.  This is the VMKernel core dump partition.  The best case scenario is you rarely if ever have a use for this partition although it is required by ESX whether it’s used by a purple screen of death (PSOD) or not.
Leave the remaining space unpartitioned.  This can be partitioned as VMFS-3 later using VirtualCenter for maximum block alignment.

Not sure how your current ESX partitions are configured?  Log on to the service console (COS) and run the command vdf -h

Update: The partitioning scheme above has been superseded in a new blog entry on 1/13/09.  /opt was added.  Here is the link to that post.

VMware revamps HCL publications

December 11th, 2008

The timing of VMware’s latest update is uncanny.  I had just written the other day about the VMware HCL and product documentation.  Yesterday, VMware launched a new Hardware Compatibility Guide portal making it easier than ever to find out out if your system, peripheral, storage, thin client, etc. hardware is compatible with VMware Virtual Infrastructure or VMware View.  The portal replaces the VI HCL document library previously maintained here and in fact, all of the HCL documentation has been removed from that page and replaced with a link to the new portal.

Leary of varying query based search engine usefulness, I tried the portal out for myself this morning by searching on “dl585“.  I was pleasantly surprised by the results.  Instead of indexing the HCL by VMware product platform (as was the case with the previous .pdf documentation library where there was a separate HCL document for each major generation of VI), the portal returns a list of results indexed by my hardware query displaying all versions of ESX that the dl585 hardware is compatible with.  In my opinion, this is much more efficient.

I then ventured over to the VMware View tab and searched on “chip pc” and was presented with a good sized list of Chip PC thin clients compatible with VMware View.  Another search on “chip” produced the same query results, however a search on “chippc” produced no results.  The query engine could use some polishing to showcase a more Google-like web 2.0 friendliness (Did you mean chip pc?)

Adding to a documentation junkie’s pleasure (that would be me), the portal also allows us to download the full version of the compatibility guides in .pdf format from the right side menu of the portal web page.  If that’s your thing, you still have the option to maintain your own offline .pdf repository.  This is one of the habits I do follow and I hope that VMware continues to notify us via RSS feed when an HCL has been updated, providing me with a direct link to the .pdf in the RSS feed so I can easily right click and “save as” into my offline document repository.

VMware configuration maximums

December 9th, 2008

Configuration Maximums for Virtual Infrastructure 3 is by far one of my favorite VMware documents.  This is a useful document for the VMware evangelist and any VMware VI administrator to have tacked up on the wall of their office for use as a quick reference.  It’s also handy for identifying platform comparison points of discussion or decision.

The document answers most of the “How many…”, “How much…” type questions about the VMware Virtual Infrastructure capabilities (ESX hypervisor, VirtualCenter, guest VMs, etc.)  more than once I’ve used this document as the basis for interactive VMware trivia sessions at our local VMware User Group meetings.  This is one of the documents that will most often be updated as new releases of VMware VI are released so it’s a good one to keep tabs on.

The VI3 documentation page keeps us informed as to what date the document was last updated.  In addition, one of the RSS feeds I am subscribed to is VMware, Inc. This feed lets me know the moment any of the VI documents are updated (at which time I then download the updated document for my own document repository I maintain).  Hardware Compatibility List (HCL) documents seem to update almost weekly which is a good indicator that VMware Engineers are hard at work in their labs certifying compatible hardware thereby expanding the list of hardware we may run our VI on.

The virtualization hypervisors (I never thought about it but is this the correct plural for hypervisor?) and management tools are evolving rapidly.  VMware, by far the most innovative of all companies in the virtualization arena, must have teams of technical writers keeping product documentation up to date.  For me personally, accurate product documentation is of the utmost importance and I hope VMware stays on top of it.  Vendor documentation is the gospel for the products and it defines what’s supported and what is not.  Keep yourself informed by reading the vendor documentation once in a while.  Even if you’re not into reading, at least know where the documentation is located for reference purposes.  I promise you the VMware configuration maximums is an interesting/fun read.

ps.  For those paying close attention, the scheduled server maintenance has been completed this evening.  I am now going out to shovel the snow in the driveway for the 3rd time in 24 hours.