Exiting the Cloud : What about « long term retention data » ?
Going to the Cloud was a decision for customers a few years back (and still is for some by the way). Now time has passed and some are considering leaving the Cloud. Whether you decide to insource back your infrastructure or to move to another provider, you have to consider recovering your data; from live data down to the so called long term retention backups or archive. How can you deal with that ?
At POST a while back we decided to design a Cloud Transition methodology that covers both the on-boarding processes (called GoCloud and RunCloud phases) and the exit processes (or reversibility, as we refer to it as the StopCloud phase). On paper, it was definitely important to make sure that we had the technology, processes and capabilities to support customers when they decide to quit. In reality, and now that some customers are making that move, we have to ensure the workability of the system as we’ve designed it and that’s a whole over story ; from restoring VMs to application data to backup, several challenges need to be met.
The live data, the easy part
If we look at the live data that customers need when exiting, we talk about the Virtual Machines, their hard drives and the application data. Most of the time this is easily done but it all depends on the volume of data to transport as well as the frequency or cutover migration decided by the new data operator. Also, the new provider (whether internal or external) can decide on a logical migration of data taking the opportunity offered by the move to wipe the slate clean and make a fresh start, which makes always a lot of sense there. This ultimately means that for small volumes this can take place over the internet or using hardware medias (ideally encrypted of course) that are exchanged between the two teams.
For large volume, POST, relying on our private backbone, can setup L2 extensions from its private Cloud environments to the customer’s datacentre, ensuring the highest possible throughputs and enabling more frequents copies. But such connectivity cannot be delivered for just a few weeks; it needs to be ongoing. Therefore the costs have to be taken into consideration.
The backup data, the challenging part
When you need to hand back backup or archived data numerous and other problems arise.
First and foremost, it is important to think about the retention. For customers having a short term retention (up to 30 days), this is rarely an issue as the time to do the migration as well as the time given to everybody to arrange for a proper transition and contract closure will be well in-line with the retention allowing for the new system to be fully in place and operational while keeping the old one when finalizing the overall exit.
For any longer retention times, e.g. one year made by means of 12 monthly copies (the usual approach), it would most of the time not fit to keep the contract ongoing but on the other hand, restoring the data might not be completely achievable. Indeed, three major issues intertwine here:
The deduplication paradox:
Deduplication is that very interesting piece of technology that allows the backup to be kept, in storage, taking up as little storage as possible, allowing for any data taken in the backup once not to be taken twice, but simply referenced to as a link in the backup index; think about all those .DLL that make up the Windows OS and being exactly the same on all your servers… For such files, a single copy is kept in the backup while all the other copies are simply “referenced” to that initial copy. This saves a lot of space, especially for large Cloud Providers like POST. Now if you think about restoring the data of the last twelve months, this would mean that we would need to “RE-hydrate” the data because if deduplication is known to the backup system, it is (almost) never known to mainstream storage technologies (like a USB disk for example). The second issue when rehydrating data back to the customer is the dating issue. Indeed, since we have, in our example, twelve copies of the data; one for each end of month, we would need to provide our customer with those exact twelve copies clearly named. Although possible, this takes time.
The volume of storage paradox:
Rehydrating data back to the customer will take space, lots of space (depending on the customer’s data types of course). As an example I was looking at this topic for a customer with two years retention of roughly 4TB of data. Our calculations resulted in having to recover back to the customer close to 94TB… both inline or offline. This is a lot of data to move over and that needs to be, once back in the hands of the customer, merged back into the new backup mechanisms. If you can even make this possible with standard file system backups; it is mostly impossible to have something consistent in application backups. Indeed, how would the new backup system know that the data you feed to it dates back to May last year ?
The licensing paradox:
the last issue is the software and technology used for backup. Unless the new provider is using exactly the same software (and most likely the same version and release…), in no circumstances can you reuse the original data; thus enforcing, once again, the need for a rehydration of all the data outside of the backup system as the only viable option. If your backup solution includes physical hardware and the capability to extend the licensing to the new provider, you could think about reselling this piece of equipment but this makes the asset management and the contractual part a bit more complex.
The contractual approach
As you can imagine from the previous chapters, we see that restoring long term retention data is complex and even further complicated the longer the retention (5 or 10 years for example).
The last option we investigate is purely contractual: provide our customers with a “Data Conservation Contract”. The form and execution of such a contract might vary from one provider to another but the general principles are constant: the data stays where it is until all retention is completely expired, and thus all data completely removed for the backups. You have the guarantee that a data set is restored to you in a usable format when you need it. Again, easy to say and do for “file level backups” but not so easy to do for application level backups because one will need an application to read and hosted the restored data set (a database for example…), not to mention the GDPR issues related to that.
As a takeaway, we can only stress, once more and again, that the reversibility remains probably one of the most complex parts of your outsourcing and Cloud projects.
Le cloud est aujourd’hui le premier vecteur de transformation digitale des organisations. A travers lui, elles peuvent consommer des ressources ou des services informatiques beaucoup plus facilement, pour mieux innover, renforcer leurs relations avec les utilisateurs, améliorer leur marketing, gagner en productivité…
Dans le domaine de la formation, à travers des missions techniques, d’autres de consultance ou en tant que chef d’équipe, j’ai accompagné l’évolution du mainframe informatique durant plus de deux décennies. Le cloud est l’aboutissement d’une évolution fulgurante. Et constitue la meilleure réponse aux besoins de mobilité, de sécurité et flexibilité des entreprises.
En tant qu’expert, je veux convaincre les acteurs économiques, petits ou grands, des bénéfices apportés par cette technologie. L’enjeu est de pouvoir bien cerner leurs préoccupations pour mieux y répondre. Notre rôle est de mettre en œuvre la solution la plus adaptée aux besoins de nos clients au départ de la meilleure architecture, en ayant recours à nos propres offres cloud ou à celles proposées par nos partenaires globaux. Chaque projet représente un défi passionnant.
Tout en étant passionné par la technologie, en restant ébloui par la manière avec laquelle elle nous permet d’aller toujours plus loin, je reste profondément attaché à la relation humaine, authentique, ancrée dans la réalité. Si notre mission est de permettre à nos clients d’adopter les capacités informatiques les plus avancées, cela ne nous empêche pas de développer une réflexion sur les impacts induits par la technologie sur nos sociétés, nos comportements, notre rythme de vie.