From Observability to Architectural Observability—Shifting Left for Resiliency

From Observability to Architectural Observability—Shifting Left for Resiliency

Architectural Observability allows organizations to shift left for resiliency, focusing on the root of their system problems, not just the symptoms.

In my previous article, Managing Architectural Tech Debt, I talked about understanding and managing architectural technical debt. Architectural technical debt is the often ignored, but ironically one of the most damaging, categories of technical debt.

In this article, I want to dive deeper into one way to manage architectural technical debt (and technical debt as a whole)—architectural observability (AO). AO is a new category of observability that I believe is just as important, if not more so, as application performance management (APM). I believe we need to shift left observability—to the architectural stage—where we can not just see symptoms, but fix core problems.

Let’s take a look.

APM is (half) the answer

You already know that APM is important. Gartner defines it as “software that enables the observation and analysis of application health, performance and user experience.” And IDC reports that companies using APM solutions see a 2.5x improvement in mean time to resolution (MTTR) and a 50% reduction in the number of incidents.

APM and observability:

  • Helps to ensure better user experiences by monitoring, in real time, performance and responsiveness.

  • Can help identify defects.

  • Can provide data to teams, such as usage patterns, bottlenecks, and overall health to keep systems healthy.

Overall, APM has become a necessary tool for troubleshooting and fixing issues in enterprise environments. APM has become table stakes.

And APM works! In my current role, we use APM to observe the usage of our APIs to understand the breakdown of URI (uniform resource identifier) requests across all of our consumers. When our APIs are not functioning as expected, we lean to APM in order to gain visibility into performance bottlenecks. In cases where an alert is triggered, the same interface can be utilized for initial troubleshooting efforts in order to help pin down the root cause.

But even though APM works, there’s a problem. APM identifies the symptoms of the defects, but not the actual defects themselves. It’s up to the team to track down why the problems are occurring. And with the pressure we often feel in prod to “just fix the problem as fast as you can,” I often see that while symptoms may be addressed, teams don’t have the time (or organizational support) to find and fix the actual core problems.

Imagine taking aspirin because you get a headache every night, but never taking the time to figure out why you keep having headaches.

To find—and address—the why of our defects, we need architectural observability.

Architectural observability: getting to the real answers

We need to shift our processes left, stop focusing on symptoms, and instead focus on the root cause of these problems and actually reduce the number of incidents caught with APM.

That’s where architectural observability comes in.

Architectural observability is the ability to analyze an application’s architecture (both statically and dynamically), understand how it works, observe changes, and identify and fix architectural technical debt.

Architectural observability is the next step in observability tools.

Architectural observability gives you visibility into your application architecture, helping you solve problems (not just identify symptoms) earlier in the SDLC by identifying architectural issues.

You probably already have the data you need to implement AO—it uses the same data sources as APM (for example, OpenTelemetry (OTel). But AO takes that data and applies a layer of intelligence that focuses on analyzing the architecture and the sources of architectural technical debt.

For example, an AO tool might analyze:

  • Architectural complexity - the interdependence and relationships of services within the architecture, the number of flows in a service, identifying multi-hop flows and circular flows.

  • Dependency mappings - relationships among services including circular dependencies.

  • Architectural drift - what has changed since your last release, what new domains/services were added, what new dependencies and flows were introduced.

  • Technical debt - such as resource exclusivity, service dependencies, duplicate services that should be merged, and complexity. Technical debt is a huge problem in the industry. 70% of organizations say that technical debt is a major obstacle to innovation.

  • Database related issues - examine if multiple services are accessing tables.

Architectural observability is proactive and strategic. Where APM tools alert on the leaks in the roof when it is already raining, AO identifies architectural issues that can lead to those leaks, way before they actually occur.

Using Tools to Gain Architectural Observability

Used well, AO doesn’t just help you find issues earlier, but helps you:

  • Truly discover and understand your architecture and its relationships and dependencies.

  • Prevent issues caused by architectural changes.

  • Make systems more resilient and scalable by continually monitoring, modernizing, and strengthening your architecture.

  • Minimize technical debt.

I love that last one. As I pointed out in my last article, architectural debt is a foe of mine that I have been battling for over a decade.

Architectural observability is a new field, and is starting to gain traction as something teams must have. There aren’t many tools yet built around the concept, but let’s look at how your team might use one of the first AO tools—vFunction—to gain AO.

Once you’ve connected to your applications (through the OTel connector or similar), the tool analyzes your system (in this case with vFunction, it’s using AI to understand and analyze your architecture).

Then you get a report on the current state of your architecture. You’ll see details such as:

  • A visualization of the architecture across your entire app portfolio

  • A map of services and entry points, cross services APIs, and external APIs

  • Exclusivity of database tables (Kafka, Redis, MongoDB) and other resources

  • Complexity scores

  • And more…

And you can use architectural observability to monitor your architecture not just in the present state, but dynamically as it changes.

  • Changes to your architecture (a.k.a., architectural drift)? You’ll know right away.

  • Add a dependency impacting resiliency? You’ll catch it before migrating.

  • Did you create a circular flow? Did you significantly increase the complexity of your system? Find out early.

Incurring even more technical debt? You can’t hide from it now.

With architectural observability, you now have hard proof of how architectural debt is affecting your systems and how by using AO you can identify and prioritize the actual problems in your systems early and often, rather than only addressing the symptoms in production. AO makes applications more resilient, more scalable, and helps your team move faster.

And AO can help proactively.

Is your team moving a monolith to a microservice? AO can give you a plan to move your architecture forward. With AO, you could analyze your monolith, understand how its domains and functionality are structured and connected, and then get actionable steps on how to modularize and move the functionality to microservices.

And once you have a distributed architecture (microservices or distributed monoliths) you’ll want to make sure your architecture doesn’t drift or get more complex to the point that things start to break, you lose control, and you have to slow down your engineering velocity. AO keeps your applications in check whether monolith or microservices.

Architectural Observability Gives You Better Systems

My readers may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional:

“Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.”

- J. Vester

Architectural observability adheres to my mission statement perfectly. By moving from plain vanilla observability to architectural observability, we can shift left for more reliable, more resilient, and better performing systems. This shift affords us more time to spend on the business problems that our teams understand best.

As John F. Kennedy once said: “The best time to repair the roof is when the sun is shining.”

Architectural observability should be placed on the same level (or higher) as APM. With AO, teams can truly understand their architecture, have a broader and more insightful view of their applications than with APM alone, and fix problems earlier.

Have a really great day!