Americas

  • United States

Asia

Oceania

Chris Hughes
Contributing Writer

How software reliability can help drive software security

Feature
Nov 01, 20216 mins
Application SecurityDevSecOps

Adopting both devsecops and site reliability engineering concepts increases software availability and security by improving stability and shortening time to implement fixes.

programmer developer devops apps developer code hacker dark secrets by peopleimages getty
Credit: PeopleImages / Getty Images

Software security and reliability have been compared and contrasted for several years, with the primary point being that both have the goal of protecting customers and consumers. However, with the continued adoption and maturation of the devsecops and site reliability engineering (SRE) movements, there is an increasing overlap between the two. When carried out appropriately, this maximizes stakeholder value.

Anyone who has worked in cybersecurity for some time is familiar with the CIA triad, which stands for confidentiality, integrity and availability. Metrics related to availability are a critical part of the devops (and subsequently devsecops) movement. These come in the form of the Devops Research and Assessments (DORA). These metrics include meantime to recover (MTTR) and change failure rate (CFR). If proposed changes fail and impede availability, and it takes unacceptable lengths of time to recover, you’ve now compromised both security and reliability.

We’re seeing a push for organizations to adopt secure devops practices, with studies such as the State of devops Report showing that high performers don’t sacrifice speed for stability (availability). A growing number of organizations are adopting SRE principles, aiming to learn from outages and turn fragile systems into antifragile and reliable ones.

Short lead time for changes boosts security

Despite the obvious relationship between MTTR and CFR, another DORA metric also has a strong potential impact from the security perspective. This would be the lead time for changes metric. This metric is often associated with the time it takes to go from code committed to code successfully running in production. It is generally discussed in the context of quickly being able to deliver value to customers or end users.

The ability to quickly deploy security-relevant updates to your systems is also an example of value delivery. If you can identify insecure components or vulnerabilities in your production systems, how quickly can you address those deficiencies without compromising the availability of your system? If you need to compromise the system’s availability to deliver security-relevant updates, you’ve now impeded both stability and security by interrupting availability and increasing the window that a vulnerability exists to be exploited by a malicious actor.

Software reliability as a security metric

The 2021 State of Devops Report also added reliability as a metric in response to a shift from availability. The logic behind the shift was to expand beyond availability and consider latency, performance and scalability. This seems logical, given that a system that is available or able to be reached but with a performance degradation or inability to scale so severe that it fails to meet business requirements isn’t available in relation to its true need. That said, many in the security community may argue that availability is inclusive of these additional factors. NIST, for example. defines availability as data or information being accessible and usable as needed.

The report also points out that teams that exceed their reliability targets were twice as likely to have security integrated into their systems development life cycle (SDLC). Reliability is often seen as a critical metric, since it is important to the customer and key to avoiding loss of market share and revenue. Integrating security into the SDLC ensures issues are identified and resolved sooner when they are both cheaper and easier to fix without causing significant disruptions to reliability in the context of the report.

Given that high reliability performers have among the lowest lead times, CFRs and MTTRs, this means they are also the most capable of applying critical security fixes quickly without compromising system reliability. This reduces the window of exploitability for adversaries and the overall security risk for systems and software. The 2021 State of Devops report found that elite performers had a 6,570-times faster time to recover from incidents. When reliability is the most critical metric for customers and closely tied to retaining market share and generating revenue, this is very powerful. It is also a black eye from the availability perspective for organizations with poor performance in code deployment frequencies, lead times and change failure rates. This seems somewhat intuitive, given that the more you practice something, the better you get at it.

So, organizations who obsess over stability by being reluctant to change are actually the most vulnerable when it comes to the inability to recover from incidents, including those related to security. This is important, given that incidents are inevitable.

The value of combining SRE and devsecops practices

There’s also an overlapping relationship between SRE and devsecops. SRE is essentially an approach that seeks to use software to manage systems, address challenges and automate tasks. It does all this in the pursuit of keeping promises to users and is often tied to metrics such as service level indicators/service level objectives (SLIs/SLOs).

Organizations that combined SRE practices with devopsops methodologies are also much more likely to achieve their shared objectives and key results (OKRs) and meet desired business outcomes. Organizations that focus on improving the stability or reliability of their software are also much less likely to spend time fixing security issues. The argument for integrating both SRE and devsecops practices is supported by industry leaders such as Adrian Cockcroft, who propose it in support of the idea of continuous resilience.

Studies and surveys continue to show that organizations who focus on both improving reliability of their software and implementing devsecops practices reap the benefit of improved reliability and security. This makes sense, given the focus on shifting security left in the SDLC, decreasing security tech debt and embracing automation, while also seeking to maximize reliability and customer satisfaction.

Many have argued that reliability is a prerequisite for software security. An unreliable system isn’t secure, but a system can certainly be available in an insecure state. Other comparable areas of overlap include the reality that no system can be completely invulnerable from every possible threat or risk. Security is essentially managing risk within a defined level of risk tolerance, as often defined by the primary stakeholders. Similarly, no software or system can be thought of entirely immune to every possible reliability and availability concern. That said, software and systems can be reliable as defined in SLOs and SLIs as previously discussed.

When you couple the increased performance of elite SRE and devops performers in terms of CFR, MTTR and reliability, coupled with the latest data breach insights from IBM’s 2021 Cost of a Data Breach report, the implications are clear. The global average total cost of a data breach is now $4.24 million. Furthermore, through September 2021, the total number of data breaches exceeds all of 2020. The frequency of data breaches is accelerating. With organizations facing this increasing rate of attack and risk, the ability to quickly deploy security updates, recover from incidents, and ensure reliability for customers is more critical than ever.

Chris Hughes
Contributing Writer

Chris Hughes currently serves as the co-founder and CISO of Aquia. Chris has nearly 20 years of IT/cybersecurity experience. This ranges from active duty time with the U.S. Air Force, a civil servant with the U.S. Navy and General Services Administration (GSA)/FedRAMP as well as time as a consultant in the private sector. In addition, he also is an adjunct professor for M.S. cybersecurity programs at Capitol Technology University and University of Maryland Global Campus. Chris also participates in industry working groups such as the Cloud Security Alliances Incident Response Working Group and serves as the membership chair for Cloud Security Alliance D.C. Chris also co-hosts the Resilient Cyber Podcast. He holds various industry certifications such as the CISSP/CCSP from ISC2 as holding both the AWS and Azure security certifications. He regularly consults with IT and cybersecurity leaders from various industries to assist their organizations with their cloud migration journeys while keeping security a core component of that transformation.

More from this author