Wednesday, 16 February 2011

Cleaner Code with the Scala Type System

Last week I happened to be sitting with a colleague looking at some prototype Java code. We came across a method that neither of us was particularly happy with, but which represented the typical messy code that occurs when wanting to do anything involving types and classes in Java. My initial comment was that it "would be so much cleaner to do in Scala". I therefore thought I'd better have play with the code to prove (if only to myself) that I was right. This post discusses the result of my effort and my conclusions from having done this work.

The Problem

In order to understand the code, let's first look at the problem that it is tying to solve:

The domain is one related to content that can appear on a website. We have a persistent domain model representing this content. However, a number of the content items are sourced from and can be updated in external systems (e.g. a CMS system or a remote content provider). We therefore need to synchronise the state of these external systems into our domain model from time to time. The java code that we were looking at was inside this synchronisation component.

The content is retrieved from the external systems, wrapped in a subtype of ForeignObject. The subtype is based on where the content was obtained from. The ForeignObject implementations can examine the content they contain and return information about its type and its unique business identifier. The synchronisation code then uses this information to lookup the associated object in the persisted domain (which may or may not exist). The ForeignObject and domain object are then passed to a mapper that updates the domain object from data that it extracts from the ForeignObject instance.

The full source code and tests described below can be obtained from my Github account:

The Java Code

Let's first take a look at the Java code that we were less than happy about. Firstly, the ForeignObject implementation that returns the detail about the type of content that it contains:

public class CMSForeignObject implements ForeignObject {
     // Other code removed

    public Class getLocalClass() {
        if ("Story".equals(type)) return Story.class;
        else if ("Image".equals(type)) return Image.class;
        else if ("BinaryObject".equals(type)) return BinaryObject.class;
        else if ("RelatedLinks".equals(type)) return RelatedLinks.class;
        else throw new IllegalStateException("Unknown type: " + type);

As we can see it looks at some string based type field that it extracts from the wrapped content and returns the associated domain class that represents that content. Perhaps not the nicest code and returning class instances always feels a bit wrong. However, I don't have too many problems with this approach. Where the messy code comes in is in the synchronisation logic that uses this class:

public class SynchronisationService {
     // Setup code remove for clarity

    @SuppressWarnings( { "unchecked" })
    public Content sync(final ForeignObject foreignObject) {
        return findById(foreignObject.getLocalClass(), foreignObject);

    private  T findById(final Class clazz, final ForeignObject foreignObject) {
        if(Story.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findStoryById(cmsForeignObject.getId());
        else if(Image.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findImageById(cmsForeignObject.getId());
        else if(BinaryObject.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findBinaryObjectById(cmsForeignObject.getId());
        else {
            throw new IllegalStateException(
                "We have managed to try and synchronise an unsupported class. This is a bug.");

How's that for a nice bit of logic? Imagine when we have many different types! I know this is only prototype code and there's certainly plenty we can do to improve it. However, it's quite typical to see this sort of stuff in production code whenever you have to go down into type based logic in Java.

The First Scala Version

To come up with a better solution, the first thing I did was to just convert the Java code directly into Scala. Nothing special here, other than using Scala syntax and modifying to return Option[Story] rather than null. So, first the ForeignObject implementation:

class CMSForeignObject(typeName: String, val id: String) extends ForeignObject {
  def localClass[C <: Content]() = typeName match {
    case "Story" => classOf[Story]
    case "Image" => classOf[Image]
    case "BinaryObject" => classOf[BinaryObject]
    case "RelatedLinks" => classOf[RelatedLinks]
    case _ => throw new IllegalStateException("Unknown type: " + typeName)

And the synchronisation code:

class SynchronisationService(repository: ContentRepository) {

  def sync(foreignObject: ForeignObject): Option[Content] = findById(foreignObject.localClass, foreignObject)

  private def findById[T <: Content](clazz: Class[_], foreignObject: ForeignObject) = {
    if ( clazz == classOf[Story] )
    else if ( clazz == classOf[Image] )
    else if ( clazz == classOf[BinaryObject] )
    else throw new IllegalStateException("We have managed to try and synchronise an unsupported class. This is a bug.");

Works fine, but not really much better than the Java code.

Adding Scala Type Goodness

Once I had an initial Scala version I then spent a couple of hours playing around with the code and some Scala type goodness. The resulting synchronisation code is:

class SynchronisationService(repository: ContentRepository) extends ContentFinder {

  def sync(foreignObject: ForeignObject): Option[Content] = findById(foreignObject)

  private def findById[C <: Content, FO <: ForeignObject](foreignObject: FO) =
    Option(finder(findStory) orElse finder(findImage) orElse 
           finder(findBinaryObject) apply foreignObject)

  private def findStory(foreignObject: CMSForeignObject) = repository findStoryById (foreignObject id)
  private def findImage(foreignObject: CMSForeignObject) = repository findImageById (foreignObject id)
  private def findBinaryObject(foreignObject: CMSForeignObject) = repository findBinaryObjectById (foreignObject id)

No nasty if/else block. No need to deal with Class instances. No nasty type casting. Much more readable. Hopefully this is much more friendly for developers and will be far easier to maintain in the future as we come to add more types of content to be synchronised. So, how did I manage to achieve this much more readable code?

The first piece of the puzzle was to eliminate the returning of Class instances from the ForeignObject. Instead we now return a new ContentType instance:

class CMSForeignObject(typeName: String, val id: String) extends ForeignObject {

  def contentType[C <: Content, FO <: ForeignObject] = {
    val contentType = typeName match {
      case "Story" => new ContentType[Story, CMSForeignObject]
      case "Image" => new ContentType[Image, CMSForeignObject]
      case "BinaryObject" => new ContentType[BinaryObject, CMSForeignObject]
      case "RelatedLinks" => new ContentType[RelatedLinks, CMSForeignObject]
      case _ => throw new IllegalStateException("Unknown type: " + typeName)
    contentType.asInstanceOf[ContentType[C, FO]]

The ContentType instance supports generics on both the type of content and the type of foreign object. It captures the actual erased types used via implicit manifests on its constructor:

class ContentType[C <: Content, FO <: ForeignObject]
        (implicit val mC: Manifest[C], val mFO: Manifest[FO]) {

  def compatibleWith(cClass: Class[_], foClass: Class[_]) = (cClass == mC.erasure) && (foClass == mFO.erasure)

It also provides a compatibleWith method that is used internally within the finder code that we will look at below.

This final piece of the puzzle is the implementation of theContentFinder trait that is mixed in to the synchronisation code. This is where most of the type magic occurs in order to simplify the actual synchronisation code:

trait ContentFinder {
  implicit protected def finderToPartialFunction[FO <: ForeignObject, C]
        (handler: LookupHandler[FO, C])(implicit mFO: Manifest[FO], mC: Manifest[C]) =
    new PartialFunction[AnyRef, C] {
      def apply(foreignObject: AnyRef) =
        if ( isDefinedAt(foreignObject) ) handler(foreignObject.asInstanceOf[FO])
        else throw new IllegalStateException("We have managed to try and synchronise an unsupported class. This is a bug.")

      def isDefinedAt(foreignObject: AnyRef) =
        foreignObject.asInstanceOf[FO].contentType.compatibleWith(mC.erasure, mFO.erasure)

  protected def finder[FO <: ForeignObject, C](lookup: FO => Option[C]) = new LookupHandler(lookup)

  protected class LookupHandler[FO <: ForeignObject, C](lookup: FO => Option[C]) {
    def apply(foreignObject: FO) = lookup (foreignObject).getOrElse(null).asInstanceOf[C]

The first insight that I had when working with this code is that the if/else condition across classes is actually a partial function across content types. We have a ForeignObject that can return a ContentType instance that contains the type information about the type of content and the type of foreign object. We therefore just have to see if there is a partial function defined at that particular combination and if there is then we can apply it.

The resulting implementation therefore defines a finder method that takes a function containing the actual code that should be applied to the ForeignObject/Content pair. It returns this wrapped in a LookupHandler instance. The synchronisation code can then just define find functions for each type that it can deal with and pass them to the finder method. Finally we define an implicit conversion for the LookupHandler instances that convert them to partial functions so that we can compose them with the orElse method. The partial function captures the erased types from the handler using implicit manifests and compares these with those captured by the content type to see if the function can be applied.


We can clearly see that the resulting synchronisation code is significantly cleaner, more easily understood and more maintainable than the original Java code. However, this comes at the expense of slightly more lines of Scala code and some fancy logic around finders and implicit partial functions. Is the benefit really worth the extra?

I can see both sides of the argument. I can certainly understand that for the Scala novice that the ContentFinder is somewhat impenetrable. However, its the synchronisation code that is likely to be updated a number of times as more types of content are bought into the domain. The code to deal with all of the type goodness is unlikely to change that much and can be easily maintained by more experienced Scala developer. Therefore the readability and maintainability of the business code wins out in my mind over some extra, well abstracted complexity.

This tends to be the typical approach to the Scala language and one of the reasons that it can be so incredibly powerful. Use all the power of the language and its incredible type system in the framework and support code so that the actual application logic can be as minimal and simple as possible.

Thursday, 10 February 2011

Selecting The Correct Test Scenarios and Data

Yesterday I happened to be reviewing some integration tests with my excellent tester colleague. These tests were defined using a set of highly simplified synthetic data selected for the purpose of making the tests pass. We both agreed that tests should actually be written using a mixture of realistic and synthetic data selected for the purpose of trying to make the tests fail! As developers we are often too keen to write tests that prove that our code is good rather than writing tests to find the problems in our code.

This exercise bought to mind a situation I had a few years ago with some code provided by an offshore development team. It was a very simple piece of code and came with some tests to show that it worked. However, it exhibited a single fundamental flaw that the synthetic data designed to make the tests pass would never have found. I've recalled the problem below (somewhat simplified), and I've update the code from Java/JUnit to Scala and Cucumber (because quite frankly I prefer these languages!).

The Requirement

The specific requirement for the system being built was that it would collect a user's date of birth during the sign-in process. If the user was under 17 then they would not be allowed to use the system. If they were 17 or over then they would be taken to the main menu. The particular piece of code in question was the age validation algorithm.

The Supplied Solution

The initial solution to this problem consisted of the following piece of code:

class AgeValidation {
  private val AuthorisedAge = 17

  def isAuthorisedAge(user: User) = {
    val cal = Calendar.getInstance
    val currentYear = cal.get(Calendar.YEAR);

    val birthYear = cal.get(Calendar.YEAR);

    (currentYear - birthYear) >= AuthorisedAge

Tests were supplied that covered two specific scenarios:

Feature: The application should ensure only users over 17 are allowed access
    As a Application Security
    I want only users aged over 17 to access the system

    Scenario: User aged over 17 can access the system
        Given a user aged over 17
        When the age check is applied
        Then they are authorised to continue

    Scenario: User aged under 17 cannot access the system
        Given a user aged under 17
        When the age check is applied
        Then they are not authorised to continue

And the implementation of these tests was as follows:

class AgeValidationSteps extends ScalaDsl with EN with ShouldMatchers {
  private var user: User = _
  private var valid: Boolean = _
  private val dateFormatter = new SimpleDateFormat("dd/MM/yyyy");

  Given("""^a user aged over 17$""") {
    user = new User("test", dateFormatter.parse("01/01/1970"))

  Given("""^a user aged under 17$""") {
    user = new User("test", dateFormatter.parse("01/01/2010"))

  When("""^the age check is applied$""") {
    val validator = new AgeValidation
    valid = validator.isAuthorisedAge(user)

  Then("""^they are authorised to continue$""") {
    valid should be (true)

  Then("""^they are not authorised to continue$""") {
    valid should be (false)

Extra credit if you can spot the problem with this implementation and why the synthetic test data would have never found it before you read on!

The Problem With This Code

Did you spot it? Here's a hint: what if my 17th birthday is not until tomorrow? Yes, that's right, the code has a fundamental defect in that anyone who is 17 in the current year but who has not yet had their birthday is granted access to the system even though they are still 16. And yes, this was the actual algorithm submitted to us by the offshore developers!

Quite clearly, the synthetic test data used in the supplied tests was selected to ensure the tests passed, but not to find the problems in the code. Let's correct the feature spec to base it on values that actually might find something:

Feature: The application should ensure only users over 17 are allowed access
    As a Application Security
    I want only users aged over 17 to access the system

    Scenario: User who was 17 last year can access the system
        Given a user whose 17th birthday was last year
        When the age check is applied
        Then they are authorised to continue

    Scenario: User who is 17 next year cannot access the system
        Given a user whose 17th birthday is next year
        When the age check is applied
        Then they are not authorised to continue

    Scenario: User who was 17 yesterday can access the system
        Given a user whose 17th birthday was yesterday
        When the age check is applied
        Then they are authorised to continue

    Scenario: User who is 17 today can access the system
        Given a user whose 17th birthday is today
        When the age check is applied
        Then they are authorised to continue

    Scenario: User who is 17 tomorrow cannot access the system
        Given a user whose 17th birthday is tomorrow
        When the age check is applied
        Then they are not authorised to continue

And the associated test steps:

class AgeValidationSteps extends ScalaDsl with EN with ShouldMatchers {

  private var user: User = _
  private var valid: Boolean = _
  private val dateFormatter = new SimpleDateFormat("dd/MM/yyyy");

  Given("""^a user whose 17th birthday was last year$""") {
    val cal = Calendar.getInstance
    cal.add(Calendar.YEAR, -18)
    user = new User("test", cal.getTime)

  Given("""^a user whose 17th birthday is next year$""") {
    val cal = Calendar.getInstance
    cal.add(Calendar.YEAR, -16)
    user = new User("test", cal.getTime)

  Given("""^a user whose 17th birthday was yesterday$""") {
    val cal = Calendar.getInstance
    cal.add(Calendar.YEAR, -17)
    cal.add(Calendar.DATE, -1)
    user = new User("test", cal.getTime)

  Given("""^a user whose 17th birthday is today$""") {
    val cal = Calendar.getInstance
    cal.add(Calendar.YEAR, -17)
    user = new User("test", cal.getTime)

  Given("""^a user whose 17th birthday is tomorrow$""") {
    val cal = Calendar.getInstance
    cal.add(Calendar.YEAR, -17)
    cal.add(Calendar.DATE, 1)
    user = new User("test", cal.getTime)

  When("""^the age check is applied$""") {
    val validator = new AgeValidation
    valid = validator.isAuthorisedAge(user)

  Then("""^they are authorised to continue$""") {
    valid should be (true)

  Then("""^they are not authorised to continue$""") {
    valid should be (false)

Run this set of tests against the existing code and the final scenario 'User who is 17 tomorrow cannot access the system' fails. Our tests have actually done something useful: they have proved that our code doesn't work. Test data selected with the specific goal of trying to find problems with the code has done its job. I'll leave it as an exercise for the reader to fix the implementation so that it all works correctly!

Further Improvements

Even with our improved set of test data there are still some significant improvements that can be done to these tests by selecting some additional scenarios and test data values that might find problems:

  • Testing the algorithm works with data boundary cases such as first/last day of the year, leap years and so on
  • What happens when the user enters a date of birth in the future, does the code still work?
  • What about if the person is over 100? Are we dealing with 4 digit or 2 digit years?
  • Is this an international system? If so, then what about timezones? A person in a timezone east of us may already by 17, but may not be 17 in our timezone.

Apply The Same Rules To Our Tests

We can also apply the same thinking to our tests. For example, if we run our improved tests against the original code but do so on the first day of the year then they will all pass. The reason being that subtracting one day from the current date would push the birth date into the previous year. We therefore need to think not only about how our synthetic test data might find problems in our code by also how they may cause problems in our tests. In this case, solving this problem is just a case of improving our code to allow our tests to define the current date rather than using whatever the current system date is:

class AgeValidation(baseDate: () => Date) {
object AgeValidation {
  def apply = new AgeValidation(() => new Date)
  def apply(baseDate: String) = new AgeValidation(() => new SimpleDateFormat("dd/MM/yyyy").parse(baseDate))

// And in the tests
val validator = AgeValidation("15/07/2010")

Our test code now has the ability to control exactly what current date is used as a basis for the comparison, so it can experiment with a wider range of possible scenarios.

Does This Change Our TDD Approach

All of the above got me thinking about how this impacts on the Test-Driven Development approach. In this model we first write a test to validate the expected behaviour of our code. We then write minimal code so that the test fails. Next we write the code to make the test pass. Finally we refactor to make the code better without breaking the tests. Most developers will approach this by picking some synthetic values that are designed to prove that the 'happy path' through the code works when they have written it. Perhaps we should be focusing more on TDD where the tests we start out writing are for the scenarios that are more likely to identify problems - that way the code we write to make these pass has to deal with these cases. Certainly something I will be experimenting with in my next TDD session.

Tuesday, 8 February 2011

The Architecture Doesn't Work

I've been building web and other large-scale, enterprise type applications for many years, following the standard accepted approaches to application architecture. I've been increasingly feeling that these approaches are somewhat flawed and that we need something better. Over this series of three blog posts I'm going to talk about some new technologies and architectures that I have been having some great success with. In this first of the three posts I want to look at why...

The Architecture Doesn't Work

If like me you've been building enterprise style applications in Java (or pretty much most other similar languages) then you will be intimately familiar with this layered architectural pattern:

The Standard Architecture

For those, who need a quick refresher, here's a quick summary of each layer:

  • Presentation Layer - Deals with rendering the current state of the application to the user/client. Could be a web page, an GUI application or even some structured representation (e.g. XML, JSON). Usually renders the contents of the Domain Objects, often by some form of template system.
  • Controllers - Handles events generated by the user/client and orchestrates the processing and Presentation. Typical approach is to accept an incoming request, extract any supplied request values, call a Service, pass the resulting Domain Objects to the appropriate Presentation component and finally return a result.
  • Service Layer - Executes business logic and processes against the Domain Objects and returns the resulting objects. Typically stateless and transactional. Processing usually involves using the Persistence Layer to load Domain Objects from a Persistent Store, working with the Domain Objects to mutate state, using the Persistence Layer to save the changes and then returning the resulting Domain Objects for presentation.
  • Domain Objects - The main objects representing entities in the business, their relationships and the business rules associated with each. Entities are usually persisted to a Persistent Store. Some people prefer anaemic Domain Objects, but I have argued that rich models are significantly better.
  • Persistence Layer - Provides an interface between the Service Layer and the Persistent Store that talks the language of Domain Objects. Allows the Service Layer to undertake CRUD (Create, Read, Update Delete) and Find operations against the Persistent Store.
  • Persistent Store - The storage for the data in the application. Usually a Relational Database Management System (RDBMS). Domain Objects are typically mapped to tables. Supports transactional semantics.
  • Object-Relational Mapping (ORM) - A library that assists the mapping between Domain Objects and the Persistent Store. Tries to solve the impedance miss-match between object and relational structures. Used by the Persistence Layer. Modern solutions use annotations on the Domain Objects (e.g. JPA & Hibernate)
  • Container Services - General services provided by a container. Supports features including: dependency injection; transactions; configuration; concurrency; service location; aop etc.

So, why doesn't this work?

It has become apparent to me over many projects that the above architectural approach suffers from a number of technical weaknesses. Often these have come about through the aim of simplifying the programming model for developers, but as we will see later the result is often the opposite. First, let's look at what I believe to be the main general technical problems that we find in this standard architecture. Later we will look at the impacts of these problems on the application and developers.

Thread-per-request model

Typically, in the above architecture, each request is handled in its own thread. Threads are usually managed in a pool and are then assigned to a request when it arrives. The thread and request are then linked until a response is returned, at which time the thread is returned back to the pool. This approach aims to simplify the development model by reducing the need for developers to have to think about concurrency issues - effectively giving them a single-threaded application path for handling each request.

Unfortunately, assigning a separate thread to each request is a very inefficient way of utilising system resources. Threads are fairly heavy-weight entities, having to maintain their own stacks, address space and so forth. The work involved in scheduling them is also quite large and given today's i/o intensive applications, many of them will be in a blocked state at any point in time. Additionally, most systems can only support a finite number of threads before suffering degradation in performance. (Some languages have tried to overcome the thread problem by introducing lighter weight constructs, such as fibers, but these still share many of the same problems for concurrent programming.)

The technical limitation of this thread-per-request approach is to actually stifle system scalability and impact application performance. Additionally, when the need to share state occurs, then developers must step outside their comfort zone into the world of concurrent programming: something the thread-per-request model has isolated them from.

Constant reloading from the persistent storage

Given a thread-per-request model, there is a natural approach to load all objects required to service a request from the persistent store on each request. Often this is completely unnecessary as a large percentage of the required data is unlikely to have changed from one request to another, or to be different between two separate running requests. However, with no convenient way to store state safely between requests or share infrequently changing state across threads we are forced down the load everything each time direction. The alternative is to drop down into shared state and concurrent programming - and we've already explored why that's not the best option! Techniques such as lazy loading try to minimise the overhead, but we have to accept that much of the data an application uses is being reloaded unnecessarily on each request.

Using database transactions for optimistic locking & concurrency control

Given each request is working with its own copy of application data kept in its own thread, how does the architecture handle multiple threads trying to update the same data at the same time? The general approach is to use database transactions to implement optimistic locking. Two requests can each load a separate copy of some data in their own threads and update it as they wish. When a thread's current transaction is committed, the state is written back to the persistent store. The database's transaction support ensures that each thread's changes are committed in an atomic and correctly isolated manner. Optimistic concurrency can follow a last-commit-wins model, or many ORM solutions offer a row versioning approach so the second transaction to commit changes to an object will fail with a stale data exception.

However, relying on the lowest level in the application stack to prevent concurrent updates isn't always the best solution. For a start, it allows both transactions to pretty much complete all their processing before one discovers that all its work is no longer valid. Secondly it causes problems when concurrency in the application layers above is required as some state would be shared across threads, and thus transactions - not nice to deal with. Finally, it doesn't actually simplify the problem because instead of having to deal with concurrency, the developer must now deal with the possibility of transaction rollback due to stale data, and the need for replaying of transactions. You'd be surprised how many applications I've come across that either don't address the problem (just raise a 500 error) or try to do something in this area and fail miserably!

Tight coupling between domain and persistent store through ORM

Another problem inherent in this architectural model is the tight coupling between the domain objects and the persistent store. This is caused by the need for the domain objects to be closely coupled to the object-relational mapping (ORM) solution. Why would we want to couple our domain objects this tightly? Well, it turns out that quite often we have to do this for performance reasons related to needing to reload from the persistent store for each request. Given that we are loading data so frequently we find that we have to tune our domain model not to make it optimal to our business model but to allow us to work most efficiently with the persistent store. And, don't even get me started on annotation based configuration (aka JPA) which forces us to consider and include all the details of our persistence strategy inside our representation of the business domain!

Scaling through adding servers

Once we have built our application using our architecture, we then have to deploy and scale it. Given that our thread-per-request approach doesn't make the best use of resources, the typical approach to scaling is to add additional servers, each running a separate copy of the application. Unfortunately this is not a particularly good way to achieve scale because we are still dependent on a shared database - and we are making heavy use of the database by loading data on each request and also using it for concurrency control.

Additionally, this approach to scalability also causes session based data sharing problems. With a couple of servers we can perhaps use session clustering with some success. As the number of servers grows we end up either having to use sticky sessions (i.e. tie each session to a single server - which means sessions are lost if that server crashes) or use some other way of persisting session data. This usually involves either adding session state into our already stressed database or introducing a whole new technology (distributed cache) to handle this data.

Static configuration

While not as significant as many of the above items, our standard architecture, and the libraries it is built upon, tends to push us towards a model of static configuration held in XML/YAML/Properties or other similar files. This increases the difficulty in managing our application at scale and in trying to make dynamic changes to values to tune the running system. When we want to support dynamic configuration we are often forced down the route of exposing properties to a management console (i.e. JMX) or pushing configuration down into the database or distributed cache, which we already saw above is not the ideal direction.

Poor approach to fault tolerance

Finally, we have the problem of fault tolerance. Typically, the approach to exceptions at any stage of processing a request is one of: catch the exception at the top level, log it and then display a generic "Oops, something went wrong!" message to the user. Hardly a great user experience. Unfortunately our architecture makes doing anything more than this prohibitively hard. For example, we might want to put some context information in the message. However, when the exception was thrown the developer might not have added this context information to the exception or, more likely, the exception was thrown by a library (such as the ORM) which is unaware of the context in the first place. We might also try to implement a re-try strategy, but implementing retries in a thread-per-request model is actually very difficult - you have to remember to throw everything you currently have away (because memory state in the thread may not be consistent with the database), clear all caches and so on. Try it, it's pretty hard!

And what is the impact?

So, given all of the above technical weaknesses with our current architectural approach, what effects do these have on our application?

Achieving performant applications is more difficult than it should be

The first major impact is that the architecture makes it hard to achieve application performance: the thread-per-request model makes poor use of resources and, loading unchanged data from the database for each request is inefficient and overloads the database.

Given these problems we generally find that we have to consider performance far earlier and in much greater depth than we really should. I'm not saying that we should never develop without considering performance (that would be negligent), but that the need to optimise our well written code and architecture for performance should be something we do rarely and in response to specific issues identified during performance testing. Designing a domain model so that it has the most efficient mapping to a database in order to achieve basic application performance is neither desirable nor the best way of capturing an accurate representation of the business domain. Compromising simple code for more complex but performant alternatives just to meet basic performance criteria is never desirable.

Another impact of the poor performance characteristics of our architecture is that we often end up having to circumvent the simplified thread-per-requets model in order to get a performant and usable application. What do we mean by this? We have to add some form of shared state - exactly what the simplified request model was put in place to avoid. Typically, we do this in one of two ways: adding caching or doing concurrent programming.

When an application proves to be less performant than required, most developers will initially suggest adding caches for data and results that don't change regularly. There are plenty of places in an application where caching can be applied. However, implementing caching is SIGNIFICANTLY more complex than most developers like to assume. Let's consider some of the things that need to be addressed: how do we keep the cache in sync with the database? what's the expiry strategy? what's the update strategy? do we pre-warm? do we block reads during an update or just block concurrent updates? is a cache per server fine or do we need a distributed solution? how will we ensure we don't get cache deadlock? As you can see, implementing a cache is actually a very complex option which requires great skill to get right. Caches should be reserved for special cases and should not be a catch-all for badly performing code and architecture.

The alternative approach to a cache is to introduce some form of shared state with concurrent access. Rather than loading all objects on each request, we keep common objects in memory and share them across and between requests. However, this comes with many of the same problems as using caching, plus the requirement of transaction management for updating this shared state. If done right, shared state with concurrent access can be simpler, more elegant and more flexible than caching, but it's the doing it right which is the challenge.

Now, this might be an over-generalisation, but my experience has shown that most 'average' developers can't write concurrent code using shared mutable state, multiple threads and synchronisation blocks. They usually end up with deadlocks, inconsistent state and difficult to track down bugs. Why? Because it is HARD and also because the single-threaded thread-per-requet model that they are used to programming to has isolated them from any considerations of concurrency. It's true that modern concurrency libraries (e.g. java.util.concurrent) have helped the problem, but most developers are using these without understanding exactly the reasons why. Just ask an average java programmer to explain the 'volatile' keyword and you will see what I mean!

Additionally, our dominant L-mode brain activity is optimised for linear, single-threaded thought. This in turn makes thinking about concurrency and all its implications a very unnatural process. I don't have any proof, but I have some very strong suspicions that a large proportion of the developer community is just unable to comprehend the real impacts of concurrent threads reading and writing in a shared state model.

Scaling applications is difficult

The other major impact of our current architectural approach is that it doesn't scale very well as application load increases. Firstly, the thread-per-request model just doesn't make good use of resources. Application servers have a finite set of threads that they are able to handle at any one time. Going beyond this limit significantly slows down the server - even though at any time many of the threads will be blocked on i/o. Once a server has reached thread saturation, the only solutions are to speed up performance (which we have seen is very hard) to free threads up more quickly, or add more application servers.

However, adding more application servers doesn't actually solve the problem. All it does is enable more concurrent threads to run, which in turn puts more load on the underlying database. Given the approach of re-retrieving data on each request we just end up overloading the database. We then need to scale the database to handle the extra load - and scaling databases is a VERY specialist job and VERY expensive as well. There's no way we can follow a path of continued scalability following this model. Clearly having an architecture based around cheap commodity hardware running app servers doesn't work if you need to massively scale databases to get any performance - so we are back to caching and shared state again, and we already explored what a nightmare these can be.

The final difficulty of scaling the application comes in the way configuration and shared session state is managed. At low scale, configuration by static config files is fine. Sharing session state through session clustering or sticky sessions is also acceptable. However, as the number of servers grows and infrastructure becomes more complex you soon end up in configuration hell - different versions of the application with slightly different configurations, massive challenges tuning parameters on running servers and so forth. Additionally, sticky sessions become less feasible and any form of session state clustering becomes impossible. Session state therefore needs to be pushed down into the database, increasing its overload and scalability issues or we start having to introducing distributed shared state solutions, with all the problems this entails.

So, it really doesn't work. Where do we go now?

We've seen that the current standard layered architecture that we are using to build applications causes a number of performance and scalability problems. Mostly these have come about from the desire to shield 'average' developers from the complexity of concurrent programming with shared mutable state. Unfortunately this simplicity causes performance problems and impedes scalability, requiring that developers must work around the restrictions in order to build something that works. There must be a better approach that solves these problems without needing developers to become shared mutable state concurrent programming gurus!

In the next post in this series I will look at a solution to these problems using lightweight asynchronous 'actors' and immutable messages. The final post will then look at some specific technology solutions that I have employed to implement this new approach.

Tuesday, 1 February 2011

My New Appreciation of Testing

Recently I have had the benefit of working with a really good tester on an agile project. This person has consistently challenged both my terminology and approach to testing. After working with him I have come to a completely new conceptual model of automated testing (checking) vs manual (and exploratory) testing, how they are very different and how they complement each other. This is the story of what I have learnt.

A Developer Model of Testing

I've been writing automated tests as part of my development process for as long as I can remember. These used to be tests written after the code to check that it worked, then I started doing TDD and using the tests to drive the design while validating that it also worked as expected. Recently I've also added BDD into the mix as a way of driving and testing requirements. However, all this time my mental model of testing has been somewhat fixed and, as I now realise, focused solely from a development perspective. Let me try to describe this view.

As a developer, my goal is simple, write some code such that:

  • given a know starting state and;
  • applying a known sequence of operations;
  • with a defined set of input values;
  • then I get an expected and valid output state

My typical approach would be to write tests that:

  1. Set up some fixture code representing a state that I want to start from
  2. Define a test that applies an operation with valid input values
  3. Define some assertions that should be true about the state after the operation has completed
  4. Implement and refactor the code until the assertions pass

I'd then go on to create some additional variations of the above test with alternative applications of the operation, adjusting the implementation code so that each test's assertions pass. Typically the alternative tests would cover a mix of:

  • Apply an operation with different valid input values
  • Apply an operation with invalid input values
  • Apply an operation with boundary input values
  • Apply an operation with non-supplied input values
  • Apply an operation with potentially harmful input values
  • Apply an operation forcing different failure cases in underlying code, subsystems or boundaries
  • Apply operations involving concurrency or synchronisation

As a conscientious developer I'd be trying to write automated tests that cover the widest range of scenarios so that I could be confident my code would work as expected in all situations. When describing my progress in a stand-up I'd be using phrases such as:

  • "The code is done and is working because all the tests pass"
  • "The tests show we've fixed all the defects"
  • "We've made the change and the tests show that we haven't broken anything"

From a developer point of view the functionality would then be handed over to a QA team member to check and perhaps for them to write some higher level test against to ensure the code meets the functional requirement (i.e. given a good set of data and a good set of inputs then the correct result is observed). Code complete, story done, into production, next please!

The Awkward Tester

The above was pretty much my approach to testing and is pretty typical of that taken by most developers I think. However, when working with this excellent tester on our team he started to question the statements that we made about our code. For example:

  • The code is done and working because all the tests pass: "Your tests show that your code makes the tests pass, but they don't prove that it works"
  • The tests show we've fixed all the defects: "The tests actually just show that you've fixed the specific identified defect case(s), all the other unknown ones are still there"
  • We've made the change and the tests show that we haven't broken anything: "No, the tests show that you haven't broken the paths covered by the tests, there's probably other things that are broken though"

My initial reaction was to just laugh this off, put the tester down as an awkward, pedantic sod and carry on as before. However, based on his well deserved reputation and previous work record, I trusted and respected the skills of this particular person. This sowed some seeds of doubt in my mind: was it just the language us developers were using that he had problems with, or was he really talking about some fundamental wider view of testing that we, as developers, were all missing?

My New Developer Model of Testing

After some serious contemplation, introspection (and a number of very long lunchtime walks) I have come to realise that what I have been considering as 'comprehensive' automated testing is really just a small part of of a much wider testing world. By focusing so much on these automated test suites I had become blind to the wider implications of my code quality. Our excellent tester friend calls what we do automated 'checks', not tests, and I think he might be right! What we are building are pieces of non-deployed code that check that our real code does what we are expecting it to do in a certain set of scenarios. Testing is so much more. Let me explain my current thinking...

In automated checks we first set up fixtures that represents known starting states for our application, component or class. For all but the most trivial examples these states will only be a small subset of all of the possible states that my application could be in. My checks then pick an operation or set of operations (i.e. feature) to evaluate against this state. Again, for anything but the most trivial single function cases the operations we pick and the order they are called will represent just a small subset of all possible operations and combinations of those operations.

Next we look at input values and their combinations and the same subset rules apply. This is also true for failure scenarios that we might check. Finally, we assert that our code leaves the application in a consistent state, with those states again being just a small subset of the possible states my application could be in. We then clear down our test fixtures and move onto the next test. This can be clearly shown in the following diagram:

The Automated Check View of Testing

Now, our tester has a different view. He is trying to identify possible places where my application might break or exhibit a defect. He is therefore using his skills, knowledge and experience to pick combinations of specific states, operations, input values and failure scenarios that are both likely to occur and most likely to find the defects in the application. These may be inside the subsets used by the automated checks, but can also be outside, from the wider set of possibilities.

Additionally, the tester has another weapon in their arsenal in that they can use the output state of one set of tests as the input state into the next - something that it's not so easy to do in automated tests that might be run in a different order or individually. We can see the tester's approach to testing in the following diagram:

The TesterView of Testing

It is clear to me now that the two approaches are both necessary and complementary. Checking has far less value if testing is not also applied. Testing would also be far more expansive and time consuming if checking was not in place to find and eliminate a certain percentage of defects at development time.

The Role of the Tester

So, what is the role of the tester? Most developers would tend to assume that the tester's job is to prove that their code works (and I will even admit that this was probably the way I used to think). Go on admit it developers, how many times have we cursed the tester under our breath when they have found a critical flaw in out code a few days/hours before it's due to be released? Come on, we know we've all done it!

Turns out we were wrong! The role of a tester is to prove that our code doesn't work and to prove it before the code gets into production and is broken by a real user or customer. Why should a tester be trying to prove that my code works? Isn't that why I've invested so much time and effort into building my automated suite of checks - to give me confidence that my code works under the situations that I have identified. What I really want to to find out is where my code doesn't work so that I can identify the omissions from my suite of checks and fix the broken code paths. This is what a good tester should be doing for me.

First and foremost a good tester should be using his skills to find scenarios whereby my code fails and leaves the application in a broken and inconsistent state. This should be the Holy Grail for a tester and they should be rewarded for finding these problems. Who wants a production outage caused by a defect in their code? Not me, that's for sure. Next they should be finding cases where the application fails non-cleanly but is not broken and can continue to function. If they can do either of these things then they are worth their weight in salary/day rate! A tester that can only find the places where the code fails cleanly or does not fail at all is really not of much value: my automated checks already proved this! I know I will certainly be valuing my testers much more than I used to. And you should too.

So how as developers do we aid out able testers in finding the defects that are in our code? We have to help them understand how the bits of the system fit together; we have to explain the approaches we have used to determine our automated checking scenarios; we have to help them understand where the failure points might be and; we have make sure they know the expected usage patterns that we have coded to. Given all this information or excellent tester should be able to find the gaps, the failure scenarios and the unexpected usages that expose the defects that our automated checks do not. When they find them we should be glad, and keen to find out how they managed to get our lovingly created code to break. We then fix the problem, add checks for those paths that we missed and learn the lessons so that we don't repeat the same defect creating pattern in any future code we write.

The Stand-up Revisited

So, given this new understanding of testing, testers and automated checks, what should the progress descriptions in the stand-up really be:

  • "The code implementation is complete and my checks for the main scenarios all pass. Mr Tester, can you find the defects please?"
  • "I've fixed the specific path(s) which I identified as being a cause of the reported defect and added automated check. We need to test to find the other paths that might be causing the same problem."
  • "The existing checks all run against the new code. We need to test to see which other paths or aspects of features have been broken by these changes."

See the difference? No assumptions that just because an automated suite of checks has completed successfully that the code either works, is free of defects or that its addition has not broken the application somewhere. The automated checks can reduce the probability of bugs or issues escaping the development process and can give me more confidence in my code. However, some defects and problems are still likely to be present and my application will still break under certain conditions that my checks don't cover. I now say: "...over to you Mr. Tester I'll explain what I've done and you can help me find where it is broken!".