The day after a “faulty switch” sent the entire BBC website empire off air for an hour, a shouty row has broken out between the corporation and Siemens, its IT contractor.
The BBC News site originally ran an article last night quoting an internal email from from Siemens, and paraphrasing the cause of the crash as, “they turned it off and back on again.”
The email sent by Siemens to BBC staff said:
Cause of issue: Faulty Switch … Services Impacted: Everything.
Siemens network engineers remotely powered down equipment at a second Internet connection at Telehouse Docklands. This got things back up and running again.
They then isolated the core router in Telehouse Docklands, and restored power to it. Once power was restored and the router was running in a satisfactory way, they reconnected to the internet and BBC networks in a controlled manner. Further investigations are ongoing to identify the root cause of this fault.
The Guardian reports that Siemens executives went ballistic when they saw the BBC’s article, prompting a hasty rewritting to remove all references to Siemens.
So did a Siemens engineer trip over the wire? Given the BBC’s penchant for keeping schtum about their downtimes, I guess we’ll never know.
Elsewhere, the BBC’s Richard Cooper went on to explain the outage:
As many of you will have noticed (and reported on Twitter) the whole of BBC Online was down last night for an hour from 22:40 due to a major network incident. We would like to apologise to everyone that was unable to access BBC Online during this outage.
Our systems are designed to be sufficiently resilient (multiple systems, and multiple data centres) to make an outage like this extremely unlikely. However, I’m afraid that last night we suffered multiple failures, with the result that the whole site went down. Enough of the systems were restored to bring BBC Online pretty well back to normal by 23:45, and we were fully resilient again by 04:00 this morning.
For the more technically minded, this was a failure in the systems that perform two functions. The first is the aggregation of network traffic from the BBC’s hosting centres to the internet. The second is the announcement of ‘routes’ onto the internet that allows BBC Online to be ‘found.’ With both of these having failed, we really were down!
We’ll be taking a very hard look at what we need to do to make sure that this doesn’t happen again.