How to simplify and reduce cost of redundancy in an always-on world, in light of recent global IT outages.
Hardly a day passes today without more news about an unexpected outage of an important digital service, whether it be an e-commerce website or a government e-service.
The reasons are often varied, but the suspects are the usual few – hardware failure, software misconfiguration, human error or cyber-attacks.
While these issues are seemingly easy to avoid upon hindsight, the complexity and interconnectedness of today’s systems make them hard to root out. The CrowdStrike-related global outage is a case in point.
A system patch that is meant to plug one loophole could end up causing thousands of systems to crash. A new piece of hardware that is meant to back up or add redundancy may not fail-over when the time comes to do so. The reasons are endless.
Worryingly, disruptions are becoming more serious and costlier. The proportion of single major outages costing more than US$1 million has grown from 15% in 2021 to 25% in 2022, according to the Uptime Institute.
Meeting expectations
At the same time, consumers, citizens and users of these digital services expect things to be running all the time. The occasional disruption may be understandable, yes, but depending on the criticality of the service, the tolerance for downtime may range from a few days to a few hours, at the maximum.
Clearly, something has to improve. This is the reason why many industries, from banking and finance to oil and gas, are looking to fault-tolerant systems – such as the Stratus ztC Endurance offering 99.99999% availability to run mission-critical applications.
Taking the guesswork out and guaranteeing better resilience against common issues that cause downtime, Stratus’ redundant computing architecture is combined with intelligent automated management to prevent in-flight data loss and ensure data integrity.
With a fault-tolerant server continuously and proactively monitoring its own health, maintaining system availability and protecting against data loss, when needed, become automated.
Instead of firing up a cold backup system that takes hours or even days to recover one’s data and apps, organizations can simply keep on running the fault-tolerant server as a production system that avoids disruption because its hardware is ready out of the box to carry on running should there be problems.
Industry use cases
This means that for the mining industry, important systems monitoring the flow and transfer of valuable commodities or high-end operations in remote areas will always be up and running, ensuring uninterrupted supply and safeguarding against any commercial costs incurred from disruptions.
For banking and finance, the same can be said. With so much riding on today’s digital finance systems, it is imperative that they are always on, even when they face unprecedented volumes of transactions that lead to heightened computing load.
With smart security controls running onboard to keep any cyber threats at bay, FSI companies have turned to fault-tolerant servers so operational tasks can be prioritized instead of maintaining their IT systems.
This was actually what happened when trades surged in the India stock exchange during the country’s much-watched elections earlier this year. A customer of Stratus, the stock exchange was able to support the higher trade volumes without worrying about disruptions.
Solutions such as the new Stratus ztC Endurance are also made to be easy to maintain with hot-swappable modules and embedded security features. Operational staff on a factory floor, near a mine or at a container port can quickly swap out faulty modules, such as storage modules, and perform quick maintenance without specialized IT expertise, if needed.
Fault-tolerant systems address another customer concern – cost. First, these resilient machines run longer, for an expected seven to 10 years, to deliver good return on investment. Second, since they are a single machine, they require fewer software licences than a combination of a production server and a backup.