vacations Incident management Resolution for Remote Teams

SRE’s Guide to Pragmatic Incident Response

Content By Devops .com In my past experience as an SRE, I learned some valuable lessons about how to respond to and learn from incidents. If you want the TL;DR, I’ll summarize them here: Declare and run retros for the small incidents. It’s less stressful, and...

Observe observability

The Key Benefits of Observability for SREs

Content By Devops .com In today’s technology landscape, organizations strive to champion innovative ideas, techniques and technologies to achieve success and outshine their competitors. For this reason, site reliability engineering (SRE) has become one of the fastest-growing enterprise roles and a set of organizational practices...

Where Do SREs Go From Here?

Content By Devops .com Charlene O’Hanlon talks with Leo Vasiliou, director of product marketing at Catchpoint, about the results of a study the company fielded with VMWare Tanzu and DevOps Institute of nearly 300 site reliability engineers (SREs). This year’s report underscores the challenges of...

How to Build an Options-Based Observability Strategy

Content By Devops .com The global pandemic forced companies to accelerate their digital transformations, pushing them toward new technologies and architectures as they struggled to adapt to a changing world. Companies wanted fast development and deployment and were happy to trade off efficiency, reliability and...

12 Ways to Bake Security Into a DevOps Transformation

Content By Devops .com Security has become an integral part of any DevOps transformation. According to the Upskilling 2021: Enterprise DevOps Skills Report, DevSecOps achieved a must-have percentage vote of 56% in the automation tool category. Security not only protects the business and its customers,...

OpsRamp AIOps

SREs Say AIOps Doesn’t Live Up to the Hype

Content By Devops .com What can you expect when investing in artificial intelligence for IT operations (AIOps)? Real-time visibility across huge volumes of information. Lightning-fast event correlation and anomaly detection. Automated remediation and self-healing, without Ops personnel having to lift a finger. It all sounds...

debugging code ownership Linux bugs

Code Ownership Is Key to Accelerate Debugging

Content By Devops .com App stability is a fundamental part of every app experience. Broadly speaking, app stability is a measurement of the number of total app sessions that are crash-free or the percentage of daily active users who do not experience an error. End...

SRE toil work process

Survey Reveals Slight Decline in Level of SRE Toil

Content By Devops .com The amount of routine toil that site reliability engineers (SREs) perform declined slightly in the last year even though IT environments in general are becoming more complex to manage. An annual survey of 300 SREs conducted by Catchpoint, a provider of...

chaos engineering

Using Chaos Engineering to Build Resilient Systems

Content By Devops .com Over the past decade, chaos engineering has become one of the most popular approaches in DevOps. It’s uniquely adapted to complex cloud-based systems and has the potential to succeed where more conventional approaches may not. Chaos Engineering Explained Traditionally, DevOps teams...