Wednesday, June 10, 2009

SOA and System Fragility

"What is it that makes a design rigid, fragile and difficult to reuse. It is the interdependence of the subsystems within that design. A design is rigid if it cannot be easily changed. Such rigidity is due to the fact that a single change to heavily interdependent software begins a cascade of changes ... When the extent of that cascade of change cannot be predicted by the designers or maintainers the impact of the change cannot be estimated."

That excerpt is not taken from a recent SOA book or paper - its from "OO Design Quality Metrics, An Analysis of Dependencies" by Robert Martin back in 1994.

In the object-oriented development space heuristics have been developed over the years to determine the quality of the underlying code. Here I'm going to argue that we can take a number (but not all) of those very same heuristics and apply them to achieve what SOA set out to achieve - reduced complexity, and increased flexibility, robustness, and ease of reuse. The reason I believe we can take these heuristics and apply them to SOA is because they can be applied to any system. It doesn't matter whether they are developed under a structured programming, object-oriented, or service-oriented approach. Nor does it matter at what levels of abstraction the methodologies, architectural styles, and technologies operate.

The Stable Dependencies Principle

As far as a good SOA architecture goes, this is surely the most important principle. It states that dependencies must run in the direction of stability i.e. the dependee must be more stable than the depender. When devising SOA-based architectures this principle is often not adhered to, with the unfortunate consequence that changes in the lower dependee layers result in changes and huge testing overheads rippling up through the enterprise.

The upshot of this principle is that an enterprise SOA architecture must be built like a well-designed building, with strong foundations and good supporting walls. And like any building, ripping out and replacing the foundations is always going to be an expensive exercise that should ideally be avoided. Some corollary example heuristics include:
  • Stable services should avoid dependencies on volatile services.
  • Volatile services are usually young - as a rule of thumb, stable mature services should avoid depending on younger, more volatile services.
  • Avoid entangling your core, stable, business process logic with platforms with short life-spans.
  • When deploying middleware, avoid volatile betaware.
Sometimes, however, the Stable Dependencies Principle is very difficult to adhere to. Consider the simple dependency on middleware that requires continuous patching and updating. In this case its almost impossible to not break this principle, and we wear the costs as a result. (As an example, during one recent exercise an operating system upgrade had to be abandoned because the costs and risks across the enterprise were simply too high.) With respect to middleware, the foundations should - as much as practically possible in today's ever-changing technology landscape - be based on proven, low-volatility middleware that is mature, capable and proven.

To some extent you can protect a depender from changes in a dependee via a suitable intervening, less volatile, isolation layer. In most software systems this isolation layer is simply an API, or Java interface, or SOAP WSDL definition, while in other cases it may be something considerably more complex such as a virtual machine monitor (or hypervisor). However, depending on the nature of the changes to the underlying implementations of the interfaces, the testing impact may still be considerable.

Coming back to Company C - the biggest problem was a lack of adherence to the Stable Dependencies Principle. The organisation actively encouraged reuse of all services, without regard to their volatility and without active management of the degree and nature of their reuse. Although the interfaces of the services were very well managed, and an excellent automated testing framework was in place, this single oversight caused considerable overhead in terms of cost and testing effort.


No comments: