Turning a new page with modern monitoring in 2023
2023 is going to be the year of better monitoring and response services! And that’s not only going to transform my ability to utilise my resources on a day-to-day basis but, more importantly, streamline what we deliver to our customers.
We've used PagerDuty, an acclaimed incident response platform, for the last few years. This year, however, we’re doing an extensive overhaul (think complete from scratch reimplementation) as we look to extend the use of Event Intelligence and automation.
We already love PagerDuty (as do our customers). It’s a truly modern solution that ingests all sorts of logs and monitoring across any number of different sources. Then, it takes all those alerts, metrics, and endpoints and brings them together into a single visibility pane. So, you have complete real-time view of every possible issue within your environment. And if you’re out-of-office, you won't miss a thing as it pushes out instant notifications to the incident responders and phones them if there’s a critical problem (because cybercrime and system outages never sleep).
Until now, a lot of the work that our managed services team does involves investigating live alerts for our service subscribers and responding appropriately. Due to the complexity of some of the environments we support, this can often be a manually intensive task. Leveraging the latest functionality in PagerDuty, we’ll be moving from a more reactive manual process to a fully automated system, so we can configure hands-off workflows to spring into action if there’s an alert from a particular asset or device. In addition, the command to address the issues is automated as much as possible – like fixing services that have gone offline with a restart or another basic function.
Why does this make me happy? The new PagerDuty automation and workflow functionality is very powerful. The turnaround time to address customer issues will be significantly reduced because we won’t need to log in, assess, and process every issue alert by hand. And we’ll also end up with valuable information about the types of issues occurring at any given time so we can take a more proactive, streamlined approach to managing them.
While we already use a lot of automation to deliver managed services, this will take it to another level. We’ll be able to leverage AI to reduce systems outages by learning the type of events happening and when, and produce trends analysis we can use to address and improve the impact on customers’ environments. It will give customers more stability and free up our engineers to invest more time in proactive work with continual improvement across health checks, assessments, and more.
The new functionality in PagerDuty will speed up our response time and potentially reduce the ongoing costs to our customers. It will make my role easier, knowing that we’re doing more for you – more efficiently - and freeing up the resources I need to drive continuous improvement.
General Manager Enterprise Cloud & Security