The key question in the minds of every airline executive recently, is probably – Could what has just happened to BA happen to us?
Written by David Murphy, Enterprise Architect, Acora
Any mature business regardless of vertical, sector, geography or internal structure, has at some point considered what is commonly referred to as ‘Business Continuity Planning’ – or BCP for short. BCP is typically defined as ‘the process of creating systems of prevention and recovery to deal with potential threats to a company.’
Due to the nature of our roles, IT professionals tend to be more familiar with BCP than most within the business. This is probably because of our involvement in what we refer to as Disaster Recovery (DR) planning. Where DR deals primarily with the IT systems that underpin the business process, the BCP should deal with all aspects of company processes – from personnel and workspace, to customer communications, to the ability to maintain effective management and leadership in any situation.
So, what makes up a BCP?
The most effective Business Continuity Plans will include regularly updated documentation, that should be stored in an off-premise, highly available location. Regular testing should take place too, ensuring that the continuity plans are up-to-date, are relevant and effective, and that all personnel (regardless of their role within the business), understand the part they play in the execution of the plans.
The plans should also provide for all possible scenarios relevant to the business. For example, as an IT company with hosting centres, Acora has plans for scenarios such as:
- Loss of power at an individual building, both short and long-term
- Loss of external data connectivity at an individual location, both limited and complete
- Telephony outages, partial or complete
- Loss of physical access due to adverse weather conditions
- Any major hardware failure
- Loss of access to or effective data issues within critical business applications
All of Acora’s documentation is stored both on and off premise, and all technical staff know where and how to access the documentation. Key employees are also aware of the expectations placed upon them, should any of the scenarios arise.
Upon the identification of a scenario, that may require the execution of any given plan, the BCP committee members are notified. This is so they can convene through an appropriate medium, in order to make considered decisions about which plans are the most relevant and whether to invoke them. This ensures invocations are not undertaken without due care and consideration for potential follow-on impacts.
So, what happened to BA?
The key question in the minds of every airline executive recently, is probably – Could what just happened to BA happen to us?
To be completely fair, it is a tough question to answer – because in truth no-one outside of BA knows what really happened, and I imagine it will stay that way for a long time. The only details we have are from the public statements issued, which simply talked about a “Global power failure”, which was later refined (according to the BBC news) to “it appears the power somehow went off at a Heathrow data centre and when it was switched back on, a power surge somehow took out the whole system.”
The thing is, failures happen. We know this, and as soon as that failure was identified, the relevant section of the BC plan would have been enacted. As IT professionals, we use resiliency to guard against events such as these, by implementing UPS units, generators, and physically diverse comms. If the business calls for it, we can create a complete duplicate of all our primary systems online, hot, with transactional replication and automated failover in the event something catastrophic happens to the main facility. And let’s face it, as a global airline with follow-the-sun support, customer care and tens of thousands of customers relying on the system at any given time, we would all expect BA to have something of this level, waiting to kick in.
For those of us outside the organisation looking in, at least for me, the real questions are:
Weren’t they prepared for that?
If so, what went wrong with the plan?
So, could a similar failure happen to you?
In short, the answer to this question is almost invariably “Yes – it could happen to you, and there is nothing you can do to change that”. Maybe the power goes off at the primary datacentre and your generator fails to kick in. Maybe Highways England digs up the road around the corner from your HQ and accidently severs all the fibre. Failures happen.
The question you should really be asking is “What can I do to lessen, or even mitigate against the impact of such a failure?”, the essence of good BCP.
The takeaway from the BA situation can be boiled down to two words, a very common phrase used by pretty much every Boy Scout in the history of the Scout movement:
Assume that failure can happen, and plan for it. Then, as with any process – test, test, test. A final point worth noting is that many organisations will have two levels of testing. The first level is internally testing the IT DR plan, which could be described as a “technical test”, and the second level which involves business users. An ideal BCP test is one that gauges the effectiveness, not only of the hard-technical systems, but the surrounding ‘soft process’ too. And of course, the ability of the users themselves to complete those processes in a timely fashion, in line with the expectations for recovery.
Read how Acora provides IT Business Continuity for EC Insurance Company, here.