< Back

Cloud Servers: 'Persistence' May NOT Mean High-Availability

Mon, 3 Jun 2013

cloud hosting

Question: From the perspective of hosting, what’s the difference between a cloud server and a Virtual Machine?

Answering half of that question is quite simple: A VM is a just like a physical server, only the underlying resources are shared with other VM instances. A hypervisor layer abstracts those resources, RAM, CPU and disk space, to create containers in which operating systems such as Microsoft Server 2012 or CentOS 6 can run.

To the user, there should be little or no difference between the virtual and physical and this includes the fact that, like a physical server, a VM is not by default mirrored or replicated in any live – live, or active – passive configuration. If either physical server or VM fail, they’re gone.

So how about a cloud server? The technology is essentially the same as a VM, to the point where there is a hypervisor abstracting resources. Yet many people will expect more from a product that advertises itself as a cloud server.

In the common techy consciousness, cloud has strong connotations of high-availability. Even in user-land, cloud SaaS services that we use day to day such as Gmail and other online accounts are expected to be available even when hardware belonging to those service providers fails.

In the common techy consciousness, cloud has strong connotations of high availability.

For many cloud customers of IaaS the concept of ‘always on’ as it relates to cloud SaaS products is translated into a ‘deploy and forget’ expectation when procuring a cloud OS instance. There is an expectation that, should a hypervisor fail, the server will simply be brought up elsewhere on the provider grid.

Unfortunately, due a discontinuity between these historic expectations and the lack of a clear definition of what a cloud server is or should be, some confusion a has entered the market around the notion of cloud servers and high-availability. The product landscape has become, for want of a better word (and excuse the pun), a little cloudy.

For various reasons, perhaps due to a desire to automate services or align spending projections, the straight forward view of cloud HA as defined above has fallen victim to a leaner model of cloud computing which provides customers with only transient OS containers. Transient because the option to reboot a VM is essentially not available - once powered off, the contents of the VM are lost and all resources returned to the compute pool.

Clearly, this less than ideal situation has been the cause of many an irate phone call to customer services. In response, a number of cloud server providers have come up with the pseudo-technical marketing term: Persistence. And here’s where things get a little tricky.

Firstly, mid to large scale providers such as Rackspace and Amazon will market the idea persistence in different ways. Amazon EC2 servers, for example, offer ‘persisted’ HA data storage through EBS (Elastic Block Storage) – including live data writes to EBS. Note: data storage specifically relates only to data volumes. In combination with periodic snap-shot full OS images, also stored on EBS, the suggestion seems to be that it is possible with some manual assistance to redeploy and update a server should the original EC2 image disappear.

In all fairness to Amazon, EBS is a sturdy storage option, locally mirrored with grid-wide replication - not to be confused with the kind of single hypervisor storage provided with each EC2 server or their S3. However, true HA (the cloudy kind) as per the above definition cannot be achieved unless there is load balancing between two separate EC2 instances.

Rackspace, on the other hand, offer a similar service but go a step further to allow ‘persisting’ of the full VM - no data volume tracking. From the literature available on the Rackspace site it would appear that, unlike the constant writes to storage provided by Amazon data persistence, cloud server persistence at Rackspace requires snap shot images saved periodically, which can then be redeployed in full.

This method improves on EC2 redeployment by avoiding the need to update an older image with newer data but it does not avoid the problem of periodic imaging (as opposed to live writes or full load-balancing); the problem being that it is periodic and some data may be lost forever. Load-balancing is of course available at Rackspace but, like Amazon, a second instance will need to be purchased as will the load-balancing service itself.

Load-balancing is of course available at Rackspace but, like Amazon, a second instance will need to be purchased as will the load-balancing service itself.

What is clear is that a cloud server with VM or storage level persistence does not mean cloud-like high-availability. It also does not mean that you can deploy and safely forget about the single hypervisor situation once done.

On the other hand, what persistence does suggest is that various methods for providers to avoid employing the part of cloud technology that offered one of the original promises of cloud computing: continuous up-time should underlying hardware fail. This promise has been re-represented and re-packaged and as such must be clearly understood by the buyer.

There are providers, ConnetU being one, who have no real need for the term persistence. All VMs/cloud servers are represented by the cloud controller with full live-write imaging and grid-wide (multi-hypervisor) replication; In the event of any one hypervisor failure, applications are readily auto-redeployed elsewhere in full at the drop of a hat. Although the use of the controller requires significantly more infrastructure and management expertise from the provider, it also offers the customer peace of mind - our main priority.