So, you want to write software?: 2010

Friday 10 December 2010

Sprinting with BDD

After some initial experiments into doing BDD using Cucumber our team has decided to adopt a full BDD and TDD approach to our current sprint. It's slightly experimental at the moment as we work out the best ways of working. Once we are done we will retrospect and see if we want to adopt the approach going forward. Initial feelings from our first couple of days are that it is going well. Here's what we have done so far.....

Sprint Planning

After deciding to give BDD and TDD a try for the sprint, the first thing was the sprint planning. Fortunately for this sprint we have a story that is pretty well understood, but sufficiently challenging to be of value in evaluating the approach. The story basically consists of taking some existing data, retrieving additional data from a new remote system and delivering the combination of the two to a remote destination (basic decoration of existing content).

Our sprint planning consisted of two parts:

a high level technical discussion of the story and agreement of the general architectural structure and approach
the writing of Cucumber 'Feature' files containing all of the acceptance scenarios for the story

These two parts went well and we created a good set of acceptance scenarios, but we then found ourself struggling to get our heads around how to go about implementing them. Our reasoning was that rather than writing out individual technical tasks, we would just create one card for each scenario. These scenario cards would then form our sprint backlog.

However, the first scenario was a 'happy path' one and we were thinking that in order to implement it we would have to build the entire solution; the remaining scenarios would then just be tiny additions. This didn't feel right in that we would then have one scenario stuck as 'in progress' for days and then all of them moving within a few hours right at the end of the sprint. Our next thought was to therefore break down this scenario into a number of technical tasks. As it turns out, this wasn't the correct approach and it soon became clear that by doing this we were missing the benefits of BDD and TDD: we were defining the detailed solution in the planning session, rather than letting it emerge from the implementation of the tests.

We decided to take a break and think over our approach. It was then that it came to us that we were still thinking in a non-TDD mindset. To pass the first scenario, we didn't need to implement the whole solution, just enough to make the BDD test pass. As we then moved on to the next scenario we could implement more of the functionality, add depth and so on. With this in mind we decided to start the sprint and inspect and adapt as we went.

Setup and Configuration

The first step in the sprint was to add Cucumber support to our project. Given that we are using BDD for acceptance tests, we decided to add this support into our functional test suite. Unit level TDD will continue to be done using our existing testing tools (JUnit, Hamcrest and mockito).

Our functional tests are written in Java and are built and executed using Maven. We therefore added a dependency to cuke4duke, which runs Cucumber on top of the JRuby platform. This setup included adding a dependency on the maven-cuke4duke-plugin, which runs the Cucumber 'Features' within a maven test phase.

Aside: Why did we pick Cucumber (which is a Ruby framework) when we are doing a Java project? There were a number of reasons, but the main ones were:

We liked the HTML report format
A number of the team are already using Cucumber on external projects (in Ruby and Scala) so there was an existing familiarity
It seems to keep out of the way and be less invasive than many of the Java based alternatives

With this set up, we added the Features file that we built during spring planning into the appropriate features directory and created step definitions with pending methods for each of the Given/When/Then cases.

The First Scenario

The first scenario was to set up some existing content and some additional content, decorate the existing content with the additional content and then verify that the content delivered at the destination had been correctly decorated. The steps for Cucumber were written accordingly and the assertions added the 'Then' case. Now it was time to implement.

This was where our TDD thinking really kicked in. Previously we would have started building the whole solution. Instead we just implemented the elements of the destination required to allow us to verify the decorated content. Given that we were only testing with one piece of content we just hardcoded to return a fixed set of additional data. Test passed.

Now, we know that this is not the full solution and that we have plenty more work to do, but we have a green on our test report. We have 8 scenarios to complete and the story is not done until they all pass, so what's wrong with this approach? Our next scenario tests with a different set of content and additional data, so this is broken by the hardcoding approach, so we can then work on the next step of delivering different sets of additional data to the destination. In this case we will still probably hardcode the data that is delivered based on the content, but progress will be made. Later scenarios will then add the calls to the new remote system to get actual data. By the end of the scenarios we will have built the entire solution driven solely by tests.

Areas of Uncertainty

Even though things are going well, we still have a number of areas of uncertainty that we need to resolve by then end of the sprint. These include:

Whether we can successfully complete the story in this way
Whether this approach impacts our productivity
Whether the team members are able to adapt to working in this TDD style
Whether the business will identify with the inherent value of this approach
Whether we can get the BAs and the POs to start thinking of the stories as BDD scenarios

We hope to resolve these as the sprint progresses. I do think that initially productivity will be reduced due to the time it takes to learn a new approach and to become proficient in it. However, while productivity may be reduced for a short time, I believe that quality will go up due to the test-driven nature of the work and that we will be more certain of building the correct solution. I it my hope that the team members will see the value of this approach and will opt to continue with it for a few more sprints to allow them all time to become proficient in the approach. We will then be able to evaluate its success or failure more objectively.

To help the business understand the value of this approach (assuming of course that it is successful) I will plan to include a quick overview of the BDD approach in the Sprint Review next week. I will give a short presentation on the approach we took, show the scenarios that we created during planning, talk through the approach we took to implement them and show the Cucumber report that is produced. It will be interesting to see the reactions.

I'll post a follow-up at the end of the sprint to let you know how we got on.

Wednesday 24 November 2010

Using Cucumber with Scala and SBT

Updated 22nd June 2012: I've now released an updated plugin (version 0.5.0). This is a significant update that switched from using the Ruby version of Cucumber (using JRuby) to the new Cucumber-jvm implementation. Please see my github repository for full details: https://github.com/skipoleschris/xsbt-cucumber-plugin. The remainder of this post, while still serving as an overview, has now been deprecated by this new plugin version.

In this post I talked about my thoughts on doing BDD using the Cucumber framework. At the time, one of the things I was struggling with was how to get Cucumber working within an SBT environment, writing step definitions in Scala. My ultimate solution was to write my own SBT plugin. Here's the details...

Cucumber on the JVM

The Cucumber framework (http://cukes.info/) comes from the world of Ruby and is distributed as a Ruby Gem. Fortunately the author of Cucumber framework has also developed an additional library called cuke4duke (https://github.com/aslakhellesoy/cuke4duke). This library allows Cucumber to be run on the JRuby platform (JRuby is an implementation of Ruby that runs on the JVM!). It includes a Ruby Gem that allows step definitions to be written in JVM languages plus library classes to support step definition development in languages such as Java, Scala, Groovy and Clojure. For my particular purpose the Scala support is what I am after.

It's relatively easy to install JRuby and then from there install the Ruby Gems for Cucumber and cuke4duke. This gives good command-line support for running Cucumber features with step definitions written and compiled using Scala and existing Scala tools.

Scala Tools Support

My current build tool of choice for Scala is the excellent SBT (http://code.google.com/p/simple-build-tool/). If you aren't using this for your Scala projects then I suggest you take a look. A quick search identified a possible sbt plugin for running cuke4duke (written by rubbish and available at https://github.com/rubbish/cuke4duke-sbt-plugin). Unfortunately this plugin hasn't been updated since June and it is running against very old versions of Cucumber and cuke4duke. My first solution was to clone this plugin and try to update it to the newest versions. Unfortunately I was unable to get it to work properly. I was also aware that this plugin was a very early version and was lacking a number of the features that I required. Time to write my own.

My Cucumber SBT Plugin

Using some of the basic concepts from rubbish's solution plus my own ideas as to how it will work I therefore put together my cucumber-sbt-plugin. This is an SBT plugin project and is implemented in Scala. Full details (and code) can be obtained from my github: https://github.com/skipoleschris/cucumber-sbt-plugin.

Some of the main features of the plugin include:

Automated dependency management of JRuby and cuke4duke
Automated install of all required gems into the lib_managed directory
Support for multiple cucumber goals offering: console, console (detailed), html and pdf output
Extensive configuration objections through overrides in the SBT Project file
Support for 'tag' and 'name' parameters being passed to each cucumber task

Project Setup

In the plugin definition file (project/plugins/Plugin.scala), add the cucumber-sbt-plugin dependency:

import sbt._

class Plugins(info: ProjectInfo) extends PluginDefinition(info) {
  val templemoreRepo = "templemore repo" at "http://templemore.co.uk/repo"
  val cucumberPlugin = "templemore" % "cucumber-sbt-plugin" % "0.4.1"
}

In your project file (i.e. project/build/TestProject.scala), mixin the CucumberProject trait:

import sbt._
import templemore.sbt.CucumberProject

class TestProject(info: ProjectInfo) extends DefaultWebProject(info) with CucumberProject {

  // Test Dependencies
  val scalatest = "org.scalatest" % "scalatest" % "1.2" % "test"

  //...
}

Writing Features

Features are written in text format and are placed in .feature files inside the 'features' directory. For more info on writing features please see the Cucumber website. For example:

Feature: Cucumber
  In order to implement BDD in my Scala project
  As a developer
  I want to be able to run Cucumber from with SBT

  Scenario: Execute feature with console output
    Given A SBT project
    When I run the cucumber goal
    Then Cucumber is executed against my features and step definitions

Writing Step Definitions in Scala

Step definitions can be written in Scala, using the cuke4duke Scala DSL. More information on this api can be obtained from the the cuke4duke wiki page for scala. For example:

import cuke4duke.{EN, ScalaDsl}
import org.scalatest.matchers.ShouldMatchers

class CucumberSteps extends ScalaDsl with EN with ShouldMatchers {

  private var givenCalled = false
  private var whenCalled = false

  Given("""^A SBT project$""") {
    givenCalled = true
  }

  When("""^I run the cucumber goal$""") {
    whenCalled = true
  }

  Then("""^Cucumber is executed against my features and step definitions$""") {
    givenCalled should be (true)
    whenCalled should be (true)
  }
}

Using The Plugin Commands

Just run one of the cucumber actions to run all of the cucumber features. Features go in a 'features' directory at the root of the project. Step definitions go in 'src/test/scala'. The following actions are supported:

cucumber - Runs the cucumber tool with pretty output to the console and source and snippets turned off
cucumber-dev - Runs the cucumber tool with pretty output to the console and source and snippets turned on
cucumber-html - Runs the cucumber tool and generates an output cucumber.html file in the target directory
cucumber-pdf - Runs the cucumber tool and generates an output cucumber.pdf file in the target directory

There are also parameterised versions of each of these tasks (see IMPORTANT NOTE below):

cuke
cuke-dev
cuke-html
cuke-pdf

Each of these task also accepts parameter arguments. E.g.:

cuke @demo,~@in-progress

would run features tagged as @demo and not those tagged as @in-progress. Also:

cuke "User admin"

would run features with a name matched to "User admin". Multiple arguments can be supplied and honour the following rules:

arguments starting with @ or ~ will be passed to cucumber using the --tags flag
arguments starting with anything else will be passed to cucumber using the --name flag

IMPORTANT NOTE: The current design of sbt prevents tasks with parameters (method tasks) being run against the parent project in a multi-module sbt project. This is why there are separate tasks with parameters. To use a parameter task you mush first select a child project. The non-parameter tasks can be run against the parent project or a selected child.

Saturday 20 November 2010

The Best Way to Apply BDD

I am currently in the process of rewriting a simple e-commerce application for one of my customers. The new version is implemented using Scala and the excellent Lift framework and is a rewrite of a six year old struts/JSP version. The core application is largely identical but I am adding a wide range of new and improved administration functions.

Typically I build a simple, well understood application such as this using ordinary TDD principles to create unit tests and drive out the code. I'd also usually add some fairly basic UI tests using something like Selenium or even just HtmlUnit to validate the front-end. However, I like to use these sorts of projects to explore better ways of doing things. I've therefore decided to use a Behaviour Driven Development approach using Aslak Hellesøy's excellent Cucumber framework. (I'll post more details on installing and using Cucumber and cuke4duke with sbt once I get it all figured out!)

In this post I want to explore some of my thinking about exactly how and where to use BDD within the project and the places where I want to integrate the Cucumber features and scenarios.

What To Use BDD For

I am a strong advocate of the TDD approach, and I intend to follow this within this development for the low level details (i.e. unit and component tests). I'm therefore looking at using BDD as a way of capturing the higher-level business requirements and acceptance criteria. My aim is to use BDD and Cucumber to ensure that the application I am building actually does what the business expects it to. Lower level tests will still be written using ScalaTest to ensure the code that I am writing does what I expect it to do and also to drive out the architectural and implementation details.

Where To Integrate BDD

Given this positioning for BDD, where I have been struggling is the correct level in my application to target the Cucumber step definitions (step definitions are the code that actually executes the scenario and conditions described by the behaviour). The following sections describe the different options I have considered and advantages and disadvantages of each:

BDD at the User Interface - In this approach the step definitions are implemented to execute against the running web application. This is usually achieved using a technology such as Selenium or HtmlUnit. Each scenario loads web pages, clicks links, fills and submits forms and makes assertions about the HTML documents and associated resources that are returned.

Advanages:

The behaviour is asserted across the complete application.
The BDD approach can be used to drive the development of the UI as well as the application logic.

Disadvanages:

You can only execute the feature steps against a built and deployed application - which may make development slower.
A web-based UI is usually the least stable part of a web application and thus feature steps may be broken as the UI is tweaked, making it appear that behaviour is being lost. Step definitions may become quite fragile.
Using BDD at this level may not fit well when user experience or visual designers are being used as this often results in rapid changing of the UI
Generally testing at this level requires a lot of boilerplate code to be written - which makes wringing BDD step definitions more complex.
Testing of web-based user interface may be better suited to a suite of UI tests that can focus just on verifying the UI display and interaction.

BDD at the Web Protocol - In this approach the step definitions are written to directly exercise the web protocol used to interface to the application. For example, the steps make HTTP calls to the application and assert that the returned results are correct and contain the expected content.

Advanages:

Does not need to worry about the complexities of web pages, javascript and so on that are perhaps better tested in a dedicated suite of UI tests.
Very well suited to RESTful or other web services that return structured data instead of HTML documents.

Disadvanages:

You can still only execute the feature steps against a built and deployed application
Building complex sequences of requests to simulate thinks such Ajax may be more complex than testing against the UI.
Asserting against returned HTML content may be more complex than using assertions in a framework like Selenium at the UI layer.
Building and testing at this layer requires that you also support a suite of UI tests.
For a web application, the interface at the HTTP layer may change quite frequently, requiring frequent changes to the step definitions. This will be particularly common in an Ajax heavy application.

BDD at the Controllers - In this approach, we ignore the UI and view parts of the application. Instead we wire up the whole application from the controllers/snippets layer down. Step definitions are then written that invoke the controllers/snippets with pre-defined requests and assert that the response action and data that would be used to generate a view is correct.

Advanages:

Features can be tested as part of the test phase of a build - no need to deploy.
Testing at this level tends to be more stable and less fragile as the application emerges.
It is usually much simpler to invoke and assert at a code level rather than a UI level

Disadvanages:

A good set of testing features is required by the chosen web framework, so that requests and responses can be easily simulated and asserted against.
Tests at this level tend to spend a significant portion of logic dedicated to the interactions with the UI rather than asserting against business rules.
A suite of UI tests is required in order to test the UI and these will also generally exercise the same controllers/snippets in order to drive the UI.

BDD at the Services - In this final approach we test behaviour at the level directly below the controller/snippet. This is usually testing against service level code, but may also require writing some step definitions against domain level objects. As this level we are much more interested in testing the business behaviour of the application rather than the behaviour of how the application interacts with the user.

Advanages:

Features can be tested as part of the test phase of a build - no need to deploy.
Code at this level will usually be more stable than at the higher levels, thus making the tests less brittle.
Step definitions at this level will usually be simpler to write and maintain than those written against the UI or controllers.
We are testing against actual business rules rather than how a user might interact with the application.
We require less testing specific tools and framework support.

Disadvanages:

Testing at this level exercises less of the application flow than the tests at higher levels.
A suite of tests is additionally required to validate the UI and the controllers to ensure that the user interaction with the system calls the services with the correct data in the correct order and correctly displays the results.
The business is more likely expecting the behaviour of how they interact with the application to be verified.
Developers must be VERY disciplined to ensure that no business rules, logic or behaviour is implemented in the controllers layer.

So, Where From Here?

After looking at all the possible places to use BDD and Cucumber, I have to draw the conclusion that there is no outright winner. It feels to me that the best mix is a combination of step definitions that test behaviour at the service layer (to verify business rules and behaviour) plus another set that test behaviour of the UI. By combining both of these approaches it should be possible to cover the application sufficiently to have confidence that its behaviour is correct.

This then leads me on to think about the best way to create the feature descriptions of the behaviour. Do we create a single description of the desired behaviour and then write two step definitions - one for the service layer and one for the UI? Or do we write a feature definition describing the required behaviour for the business services and then have an alternative behaviour specific for the user interaction with the system?

I'm still unsure of the correct answers to these questions. Anyone out there got any thoughts or experience? In the mean time I'm going to build my application trying both approaches and see which works out best.

Wednesday 17 November 2010

Why Releases Should be Automated and Environments Equivalent

Last weekend I was involved in a fairly major release for one of my customers. Ultimately we were unable to complete the release due to a number of factors (some within our control and others outside). However, the release was still deemed successful as we were able to cleanly abort and rollback to the previous working version with an absolutely minimal service outage. The release did however highlight a number of places where the release process could be improved.
So, where did this release go wrong. There were three main problems that were encountered:

A switch failure in one of the data centres that temporarily rendered some servers inaccessible.
Mistakes in the manual actions in the release plan.
A data inconsistency between two different data centres.

Unexpected Events During a Release

Murphy's Law: Anything that can go wrong, will go wrong.
Unfortunately, no matter how well you plan, there will always be things that you can't anticipate. In this particular release we had a hardware switch fail right in the middle of the release. An erroneous configuration in the data centre caused this failure to render a number of our servers unreachable. We were therefore mid-release with no way to go forward and an arduous rollback process to restore us back to a working state.
Fortunately due to diligent disaster recovery work undertaken previously we were able to fail the entire site over to the alternative data centre. A pragmatic decision was eventually taken that we couldn't wait on the chance that we MIGHT be able to complete the release so we rolled back. This was well planned and went smoothly enough that within 30 minutes of a new switch coming online we were back up and running on the original software versions. Nice work.
However, even flushed with the success of our miraculous escape from the clutches of Murphy, we have to consider what might have been. Our rollback was a largely manual process and although tested in the test environment, our production is just different enough as to add a level of uncertainty to the rollback process. There's also the greater chance of errors in manual steps when under pressure of a live situation.
So how could we be more certain of our rollback plan?

Untested Manual Release Plans

Our release had a very detailed plan that had been well reviewed before hand. A variation of the plan had been executed in test but, as I already said, the production environment is just different enough as to require a different (and slightly more complex) plan. Additionally, this release had some additional complexities requiring a number of extra manual steps.
What we discovered during the release was that some of the steps specific to production were in fact not quite correct. Couple this with the chance of making errors when executing manual steps and we have a number of places with a high potential for failure.
While the plan did run pretty smoothly, how could we reduce the chances of errors and mistakes?

Problems That Can't be Found Before Production

The final problem encountered on the release was one of database consistency. Database in two different data centres were thought to be consistent. It turns out they were not! With hindsight, we should have checked this and have been continually monitoring these over time. We live and learn.
What should really have happened is that we detected the possibility of inconsistent databases before we got to production. Unfortunately all the lower environments only have a single data centre model for the database in question so there was no way that anything could ever get out of sync in those cases. Hence, no way to detect the problem before reaching production.
How can we find these sorts of problems earlier?

Automate Releases, Equivalent Environments

Fortunately, all of the above problems can be fixed with two solutions. I wanted to say SIMPLE solutions, but unfortunately they are not particularly simple and require a fair amount of work to get right. These solutions are:

Automate the release and rollback pipelines.
Ensure Integration and Test environments are ~~identical~~ equivalent to Production.

Automated Releases
In a modern computing environment there is really no need to include manual steps in a release. Even when releases have lots of complexity it should be possible to script all the required steps. An automated script can be tested, fixed and re-tested many times in order to ensure it is correct before running it against a production environment. This is much better than a manual process that is difficult to test and open to human error.
Don't get me wrong, creating automated releases is hard, but the benefit of smoother, faster and more frequent releases to production gives significant enough business benefit to make the effort worthwhile.
Equivalent Environments
A manufacturer building a product wouldn't create a prototype and then go straight into mass production with a design that varied significantly from that prototype. Companies building software shouldn't do so either. The purpose of Continuous Integration is to find problems early when they are quicker and cheaper to fix. In order to work well, the lower environments need to match production in terms of applications, server structure, configuration and so on.
Additionally, if we are to build automated deployment plans then we need production like environments to develop and test them against. An unproven automated plan is of no more value than a fully manual one.

Conclusions

By automating releases and having equivalent environments we give ourself the best possible chance of undertaking regular, successful and stress-free releases. When Murphy does come along, knowing we have an automated way to get back to a working system saves a whole lot of worry and stress. Yes, it's hard to get to this point, but the long term benefit to a business from doing so is worth every penny and every hour spent.

Wednesday 10 November 2010

Introduction to SBT

I'm at the London Scala User Group monthly meeting. We're here at the Skillsmatter Exchange and we are learning about Simple Build Tool (sbt). Here's my live blog entry on the session. Sbt can be found at http://code.google.com/p/simple-build-tool/

Running/Using SBT

Once SBT is installed, you can run SBT in a new directory. It will ask if you want to create a new project. This will ask you a few questions and then set up the basic project template:

/Users/chris/temp/sbt 6% sbt
Project does not exist, create new project? (y/N/s) y
Name: demo
Organization: skipoleschris
Version [1.0]: 
Scala version [2.7.7]: 2.8.1
sbt version [0.7.4]:

Template structure follows maven directory conventions:

lib - directory for adhoc libraries
project - sbt project files
src - main/scala, main/resources, test/scala, test/resources ala maven
target - where all the built stuff goes

Start up sbt and you can now, build, test, package and run:

> run
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == compile ==
[info]   Source analysis: 1 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info]   Post-analysis: 2 classes.
[info] == compile ==
[info] 
[info] == run ==
[info] Running Demo 
Hello World
[info] == run ==
[success] Successful.
[info] 
[info] Total time: 5 s, completed 10-Nov-2010 18:53:12

We can turn off some of the verboseness by setting the log level:

> warn
Set log level to warn

To customise out project we create a scala project file in the project/build directory:

import sbt._

class DemoProject(info: ProjectInfo) extends DefaultProject(info) {

  // Test dependencies
  val scalatest = "org.scalatest" % "scalatest" % "1.2" % "test"
}

You can see that dependencies are specified in a very similar way to the maven group/artifcatId/version/scope model. We then reload the scala project and update to pull down the dependencies:

> reload
[info] Recompiling project definition...
[info]    Source analysis: 1 new/modified, 0 indirectly invalidated, 0 removed.
[info] Building project demo 1.0 against Scala 2.8.0
[info]    using DemoProject with sbt 0.7.4 and Scala 2.7.7
> update
[info] 
[info] == update ==
[info] :: retrieving :: skipoleschris#demo_2.8.0 [sync]
[info]  confs: [compile, runtime, test, provided, system, optional, sources, javadoc]
[info]  1 artifacts copied, 0 already retrieved (1742kB/44ms)
[info] == update ==
[success] Successful.

Some other important/useful sbt commands:

clean-cache - remove cache files
test - compile & run tests
test-quick - compile & run only tests that have changed
console - starts a scala console

It is also possible to prefix any command with ~, which runs the same command each time changes to source files are detected. Very useful with test-quick.

sbt can also build mixed Java and Scala projects and with a little bit of configuration (github.com/szeiger/junit-interface) can also run JUnit tests as part of the build process (by default it runs just scalatest and specs tests).

Another useful command for web developers is: jetty-run, which starts up a Jetty server and deploys the application to it:

> jetty-run
[info] 
[info] == compile ==
[info]   Source analysis: 0 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Nothing to compile.
[info]   Post-analysis: 2 classes.
[info] == compile ==
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == prepare-webapp ==
[info] == prepare-webapp ==
[info] 
[info] == jetty-run ==
2010-11-10 19:11:10.505:INFO::Logging to StdErrLog::DEBUG=false via org.eclipse.jetty.util.log.StdErrLog
[info] jetty-7.0.2.RC0
[info] NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet
[info] Started SelectChannelConnector@0.0.0.0:8080
[info] == jetty-run ==

Web apps are supported with a simple change to the scala project build file:

import sbt._

class DemoProject(info: ProjectInfo) extends DefaultWebProject(info) {

  // Test dependencies
  val jetty7 = "org.eclipse.jetty" % "jetty-webapp" % "7.0.2.RC0" % "test"
  val scalatest = "org.scalatest" % "scalatest" % "1.2" % "test"
}

SBT Plugins

SBT supports the development and use of plugins. Making a new plugin, the project extends PluginProject as opposed to DefaultProject or DefaultWebProject.

For example, we can create a Scala trait that extends BasicScalaProject and do things, such as register new test listeners for generating reports and so on. Plugins must be developed using Scala 2.7.7 as this is the version that SBT is compiled in. Good tip: develop the trait in a normal project and then when it's working make it into a plugin project.

giter8

A tool that generates files and directories from templates published on gitgub. Can be found at https://github.com/n8han/giter8. Provides similar (but obviously much better) functionality than the maven archetype plugin. Currently a limited set of templates, but this should grow over time. In particular there is a template for building Android projects in sbt. E.g. g8 gseitz/android-sbt-plugin

You can also build your own giter8 templates and upload them to your github repository.

IDE Integration

There are SBT plugins for most different IDEs that generate IDE project files. There are also SBT plugins for most IDEs, although many developers actually keep SBT open in a separate window using the ~ command.

Well, that's about it for the SBT overview. Hope you found something of use.

Friday 5 November 2010

Functional Programming Challenge: Most Frequent Multiple Occurring List Item - SOLUTIONS

In this post I set out a challenge to come up with the best solution to the following problem: how to find the most frequently multiple occurring item in a list.

I had created my own solution, but felt that a better one was possible. The criteria specified were simple code, elegance and performance.

I have had two submissions to the problem so far and this post offers a comparison of them along with my original solution. My comparisons were carried out using the following two test lists:

val random = new java.util.Random(2L)
val list1 = (for ( i <- 1 to 100000 ) yield random.nextInt).toList.distinct
val list2 = (for ( i <- 1 to 100000 ) yield random.nextInt(1000)).toList

The first list contains no duplicate elements, so the result should always be None. The second list will contain multiple duplicate elements.

My Solution

My solution to the problem was based around building a map of list items to frequency count and then finding the most frequent multiply occurring item in that map. My implementation was achieved with two small functions and two foldLeft operations:

def highestMultipleFrequency1[T](items: List[T]): Option[T] = {
  type Frequencies = Map[T, Int]
  type Frequency = Pair[T, Int]

  def freq(acc: Frequencies, item: T) = acc.contains(item) match {
    case true => acc + Pair(item, acc(item) + 1)
    case _ => acc + Pair(item, 1)
  }
  def mostFrequent(minOccurs: Int)(acc: Option[Frequency], item: Frequency) = acc match {
    case None if item._2 >= minOccurs => Some(item)
    case Some((value, count)) if item._2 > count => Some(item)
    case _ => acc
  }
  items.foldLeft(Map[T, Int]())(freq).foldLeft[Option[Frequency]](None)(mostFrequent(2)) match {
    case Some((value, count)) => Some(value)
    case _ => None
  }
}

In the tests, my solution performs fairly consistently across both test lists, the second being slightly faster due to the smaller size of the map being used by the second fold function.

Solution from Nilanjan R

This solution send by Nilanjan R is significantly simpler and shorter in the amount of code it uses. It takes the approach of grouping items by themselves and then sorting the resulting lists by their size and taking the first item:

def highestMultipleFrequency2[T](items: List[T]): Option[T] = {
  items.groupBy(x => x).values.toList.sortWith((f, s) => f.size > s.size) match {
    case x :: xs if x.size >= 2 => Some(x.head)
    case _ => None
  }
}

Under test, this solution performs significantly slower than my solution on the list without duplicates. However, when the list does contain significant multiple occurring elements then it performs slightly better than my solution. To me this tends to indicate that the sort function is the major performance factor in this solution. I do really like this due to its very clean, simple and easy to understand code. It's certainly my choice for the Noughts and Crosses application where simple code is more important than raw performance and where I am only dealing with very small lists.

Solution from Antonin Brettsnajdr

The solution offered by Antonin follows a similar approach to my solution in that it builds a map of item frequencies and then find the most frequently occurring element using this map. However, this solution implements the whole algorithm in a single tail-recursive function:

def highestMultipleFrequency3[T](items: List[T]): Option[T] = {
  def mff(l: List[T], m: Map[T, Int]): Option[T] = l match {
    case element :: tail => if (m contains element) mff(tail, m.updated(element, m(element) + 1))
                            else mff(tail, m + (element -> 1))
    case Nil => val occurences = (m map { item => item._2 }) 
                if (occurences forall (_ == 1)) None 
                else {
                  val maxOccur = occurences.max
                  m find ( letter => letter._2 == maxOccur) match {
                    case Some(x) => Option(x._1)
                    case None => None
                  }
                }
  }
  mff(items, Map())
}

This solution is by far the best performing. It is two to three times faster than my solution. There is a significant performance benefit when dealing with a list that contains multiple duplicated items. While the code is not as simple as the previous solution, it is very readable and is a nice example of tail recursion.

A big thanks to Nilanjan R and Antonin Brettsnajdr for their solutions. I found it very interesting looking at the different way people approach a problem such as this. Can you do better? Let me know your solution.

Monday 1 November 2010

Functional Programming Challenge: Most Frequent Multiple Occurring List Item

A little functional programming challenge for those who like that sort of thing. This one came up while I was improving my Noughts and Crosses example application.

The challenge is this: Given a list of elements of type T, find the element with the most multiple occurrences.

For example, the following should apply:

List("a", "b", "c", "a", "b", "a") 
    -> should return "a" as this occurs the most in the list
List("a", "b", "c")                
    -> should return nothing as no elements occur multiple times
List("a", "b", "a", "b")           
    -> can return either "a" or "b" as they both occur the same number of times

The simplest solution I have come up with so far (written in Scala) is:

def highestMultipleFrequency[T](items: List[T]): Option[T] = {
  def freq(acc: Map[T, Int], item: T) = acc.contains(item) match {
    case true => acc + Pair(item, acc(item) + 1)
    case _ => acc + Pair(item, 1)
  }
  def mostFrequent(frequencies: Map[T, Int], minCount: Int = 2) = {
    frequencies.find(_._2 == frequencies.values.max) match {
      case Some((value, count)) if count >= minCount => Some(value)
      case _ => None
    }
  }
  mostFrequent(items.foldLeft(Map[T, Int]())(freq))
}

While this works and is fairly elegant I can help feeling that there must be a better way. Traversing the entire list to build a map of item to frequency and then searching all the values in that map to find the one occurring the most just seems a tad clunky.

I also played around with an alternative implementation for the mostFrequent function, but I'm not sure this is any better or more efficient:

def mostFrequent(frequencies: Map[T, Int], minCount: Int = 2) = {
  frequencies.toList.map(_.swap).max match {
    case (count, value) if count >= minCount => Some(value)
    case _ => None
  }
}

Finally I came up with this solution, which requires less traversal through the map than the previous ones, even if the code is slightly longer and the generics for the foldLeft methods are less readable:

private def highestMultipleFrequency[T](items: List[T]): Option[T] = {
  type Frequencies = Map[T, Int]
  type Frequency = Pair[T, Int]

  def freq(acc: Frequencies, item: T) = acc.contains(item) match {
    case true => acc + Pair(item, acc(item) + 1)
    case _ => acc + Pair(item, 1)
  }
  def mostFrequent(acc: Option[Frequency], item: Frequency) = acc match {
    case None if item._2 >= 2 => Some(item)
    case Some((value, count)) if item._2 > count => Some(item)
    case _ => acc
  }
  items.foldLeft(Map[T, Int]())(freq).foldLeft[Option[Frequency]](None)(mostFrequent) match {
    case Some((value, count)) => Some(value)
    case _ => None
  }
}

Anyone want to offer a different (better) solution? Use the functional programming language of your choice. Points awarded for simplicity, elegance and efficiency. The winning solution will be published here and (if the provider agrees) will be put into my Noughts and Crosses example in place of the current implementation.

Friday 22 October 2010

From Java to Scala in O's and X's

I've been doing a lot of evangelising on the Scala language, mainly to Java devs that I work with. A number of the have become very interested in the language and have started making an effort to learn more about it. The most common questions that seem to come up relate to how to move from the Java to Scala language and the key differences (such as a functional approach, building apps though composition and mixins, the type system and so on).

With this in mind I decided to build a small application that starts as a typical Java development and over a number of versions becomes a full-blown Scala implementation. All versions of the application are maintained so that it becomes possible to step through and see each new level of progression into the Scala way of doing things. The progressions can be presented quickly in an hour or so, or it is possible to spend much longer on each one discussing design decisions and so forth. Even without the Scala focus, it's also a good application for discussing different design and implementation strategies with Java developers.

To have an application where the domain is easy to understand I picked the age old Noughts and Crosses game (Tic-Tac-Toe for my American readers). The advantages of this game are that it is very simple to learn and understand, it has some interesting data structures and interesting set of rules that can be explored for selecting the best move to make. It can also be implemented quite happily in a handful of classes and a few hundreds of lines of code. The particular approach I selected was to have the game played by two computer players who each take it in turns to pick the best move to make until one either wins or the game is drawn because the grid is full.

Aside: Initially the rules were so optimised that my implementation only ever resulted in a drawn game. The final solutions therefore ensure that the first couple of moves are randomly selected in order to introduce a bit of interest.

All of the source code that I'm going to discuss is available on my github account at: http://github.com/skipoleschris/OandX

Version 1 - Typical Java Code

After spending so much time writing Scala (and the most Scala like Java code possible) it was surprisingly difficult dropping back into the type of Java code that you find on most projects. For this version I tried to adopt the Java coding style of a typical average Java developer. Code is written in basic Java and tests in Java using TestNG. The solution works find but has a number of code and design smells, specifically:

'null' is used to represent empty positions on the grid
The grid is represented in a Java Bean style and so all details of how to work with the contents of the Grid are actually outside the Grid class - leading to a pattern of utility classes that contain large amounts of domain logic
There is a lot of duplication in the WinScanner and MoveFinder classes
MoveFinder in particular makes it very difficult to separate the rules being applied from the complex code implementing them
MoveFinder has the most fantastic if ( ... == null ) multi-repeated check method!

Version 2 - Slightly Improved Java Code with ScalaTest

The next version improves the Java code slightly. (It's still code way below the level I would normally write, but this is an exercise in converting from Java to Scala, not in producing perfect Java code!). In particular, the Grid class now contains more domain logic; the WinScanner and MoveFinder classes have duplication removed; and the MoveFinder is implemented with a chain of filters rather than the nasty if clause. The MoveFinder in particular is still WAY too complex.

The other main feature of this version is that tests have been ported over to ScalaTest. I went for the BDD style of tests using FeatureSpec, GivenWhenThen and MustMatchers as I feel this shows the full power of DSL-like test constructs. Most Java developers I have show this to seem to find it very readable. Typical tests now look like:

feature("The OnX grid") {

  scenario("can have a token added to it") {
    given("a new grid")
    val grid = new Grid

    when("a token is added at a given position")
    grid.addToken(Grid.MIDDLE, Token.nought)

    then("that position contains the token")
    grid.getToken(Grid.MIDDLE) must be (Token.nought)
  }

}

Hopefully much more readable!

Version 3 - Java Without Semicolons

The third version ports the Java code directly to Scala with minimal use of advanced Scala language features. All I did here was convert classes and data structures directly, remove boilerplate code and use Scala for loop syntax. At this point it just looks and reads pretty much like a Java application.

For example, here's the same method from the MoveFinder class implemented in both Java and Scala:

private static class DoubleWinPositionFinder implements PositionFinder {
    @Override
    public Position findPosition(final Grid grid, final Token token) {
        final Set positions = new HashSet();
        for ( final Line line : grid.getLinesWithMatchingTokenAndTwoSpaces(token)) {
            for ( final Position position : line.getEmptyPositions() ) {
                if ( positions.contains(position) ) return position;
                else positions.add(position);
            }
        }
        return null;
    }
}

private class DoubleWinPositionFinder extends PositionFinder {
  def findPosition(grid: Grid, token: Token): Position = {
    var positions = Set[Position]()
    for ( line <- grid.linesWithMatchingTokenAndTwoSpaces(token) ) {
      for ( position <- line.emptyPositions.toList ) {
        if ( positions.contains(position) ) return position
        else positions.add(position)
      }
    }
    null
  }
}

This approach is a great way to get working with Scala. Just bring all your Java knowledge and coding style with you, make use of the Scala tools, test frameworks and so on and gradually move towards advanced Scala when you feel comfortable.

Version 4 - Moving Forwards With Scala

In this version, I moved to a much more Scala like model. In particular:

The Grid class is now immutable, returning a new Grid on each change
The Option[T] concept is now used as a replacement for null to allow better representation of empty positions
The code uses more advanced functional concepts such as map, flatMap, filter and so forth.
Mixins are used to compose the Grid with traits representing line handling, win scanning and move finding. This gives a much better represented Grid concept and smaller API surface area while still allowing separation of concerns into separate traits and classes

While I think this is a much better implementation than version 3, I am still not happy with the implementation of some of the line building logic and the move finder is still too complex and makes it difficult to separate the rules being applied from how they are implemented. In particular code like this (which finds all the empty positions on a list of lines) is just way too messy:

def emptyPositions(lines: List[Tuple2[List[Option[Token]], Position]]) =
  lines.filter(_._1 contains None).flatMap(empties => empties._1 zip positions(empties._2)).filter(_._1 == None).map(_._2)

This leads me to believe that while I am using some of the more advanced language features that something is wrong with my underlying data structures if the code gets this complex. However, looking at our move finder function we can see some simplification starting to take place:

private case object DoubleFreePositionFinder extends PositionFinder {
  def findPosition(token: Token) = {
    val allEmptyPositions = emptyPositions(linesWithMatchingTokenAndTwoSpaces(token))
    if ( allEmptyPositions.contains(Position(1, 1))) Some(Position(1, 1))
    else if ( allEmptyPositions.isEmpty ) None
    else Some(allEmptyPositions.head)
  }
}

Version 5 - The Final Solution

For version 5 I decided to approach the problem from a much more functional perspective. By switching my thinking from the object approach (i.e. what is my state and how can I encapsulate it) to a functional approach (i.e. what functions do I want to perform) I was able to realise an important concept. Although the noughts and crosses game is represented as a grid, all of the functions we want to apply are either on an individual position on the grid or on one of the eight lines that can be made from the grid. By switching my data structure to one more appropriate for applying these functions we end up with a significantly simplified set of functions for evaluating the game state.

For example, we can take a list of tokens, zip them with a list of positions and then apply functions across either the token part or the position part of the pair:

class Grid(tokens: List[Option[Token]]) extends TokensWithPositions
                                        with Lines
                                        with LineQueryDSL
                                        with WinCheck
                                        with MoveFinder {

  require(tokens.length == 9)

  val positions = for (row <- 0 to 2; column <- 0 to 2) yield Position(row, column)
  private val values = tokens zip Grid.positions

Lines can therefore be represented in exactly the same manner:

type Line = List[Pair[Option[Token], Position]]

This new structure greatly simplifies the rest of the application code when it comes to building and working with lines:

def emptyPositions(lines: List[Line]) = lines.flatMap(_.filter(_._1 == None).map(_._2))

The other main change in this version is separation of the rules for finding the best position for a player from how those rules are implemented. This is achieved through a custom DSL that allows defining the rules in a highly readable way. For example, the rule we saw previously now becomes:

private def doubleFreePosition(token: Token) =
  find linesHaving 2 positions Empty and 1 tokenMatching token select First take EmptyPosition

The DSL is a fairly simple one, implemented as a chain of filtering functions that reduce the lines down and then a selector that pulls the correct result from the matching lines. It actually turns out that the DSL is pretty easy to read and follow through, which also significantly simplifies the code and reduces duplication.

Although version 5 is a much better solution, there is still room for improvement and some functions that could certainly be implemented in a better way. Let me know when you find them and send me your better solution :-)

Version 6 - Adding Some Actors

The final version takes the code from version 5 and wraps actors around the classes. One actor for the game coordinator, one for the grid and one for each player. These actors are loosely coupled through message passing.

Noughts and Crosses is not the best demonstration for actors and concurrency as it's turn based nature makes it a purely sequential game. However, version 6 does allow the concept of actors to be demonstrated in an easily understood way and leads on to more detailed conversations about using actors for concurrent problems.

Conclusions

The Noughts and Crosses example shows how a Java programmer can migrate to Scala in a gradual manner, adding new Scala concepts as they become more comfortable. However, it's important to point out that Scala is certainly not a silver bullet for good software. Like all development it requires careful thought, good design, solid programming practice, good testing and constant refactoring. Any Scala application will only be as good as the developer(s) building it.

Monday 11 October 2010

Solving Polymorphic Problems using Scala Type Classes

Last week I attended Scala Lift Off London. There were a couple of sessions on Type Classes in Scala, which I found very interesting. In fact, they offer a great solution to a problem that I encountered recently on a Java project (although I'm using Scala code for this entire post).

This particular project had a number of domain objects that could be 'published'. In this particular case, published meant converted to an XML representation which was then passed to a remote server for processing. Multiple domain objects can be published at the same time in a 'publish package'. The challenge was that none of these domain objects had any particular similarities or common interfaces. E.g.

class Page {
  // Page attributes and methods
}

class Story {
  // Story attributes and methods
}

// Other domain objects: image, component etc.

Now, there are a number of ways that these classes could be handled for publishing. An initial suggestion might be to add a Publishable interface to each domain object. However, this is a bad idea for a number of reasons:

It couples the knowledge of publishing into domain objects that are used for multiple different purposes
The domain objects are complicated for all users who don't require publishing
It may not be possible to change the domain objects if they are owned by another project.

So, how do we go about adding these domain objects into the publish package. Well, there are a number of approaches, none of which are ideal. The first is to add a separate method to the publish package for each class that can be published:

class PublishPackage {
  def add(page: Page) = ...  // convert to XML and add to package
  def add(story: Story) = ... // convert to XML and add to package
  ... // And so on
}

This creates a publish package that is very unstable and will soon grow beyond a maintainable size, not good. Another approach is to have some kind of dynamic class-based lookup:

class PublishPackage {
  private mappers: Map[Class, PublishMapper] = ...

  def add(obj: AnyRef) = ... // Lookup mapper by object class and call method on it
}

This is much better as the logic to map each domain object to its publish XML is now outside the publish package. However, it still has some issues in that we need to update the map of mappers each time we have a new domain object type to publish, which may involve changing publish code or configuration. There is also a runtime risk here in the the compiler will let us pass any object to the add method even if it can't be published, so we loose some level of type safety.We also have to write additional code in the add method to handle this case.

This general - I want to handle objects polymorphically but can't because they don't share a common base - problem occurs all too frequently in software, especially when dealing with legacy code bases or objects from external dependencies. Fortunately Scala Type Classes give us an elegant solution to the problem. What we will do is take the second solution described above, but use Type Classes to solve the problems of runtime safety and the need to update the map of mappers.

First, we change to publish package to support adding of publish-ready XML instead of any specific or general binding to domain objects:

class PublishPackage {
   def +(nodes: NodeSeq) = ... // Add nodes to publish package
}

Next we declare a common trait that will be implemented by all the mappers:

trait PublishMapper[T] {
  def formatForPublish(content: T): NodeSeq
}

Next we declare the Type Classes (or Objects in this case) that implement each of the mappers. Note that they are defined as implicit, allowing the correct one to be implicitly selected for the type of domain object being used:

implicit object PagePublish extends PublishMapper[Page] {
  def formatForPublish(content: Page) = ...
}

implicit object StoryPublish extends PublishMapper[Story] {
  def formatForPublish(content: Story) = ...
}

Next we need to declare the Type Class method that takes an implicit PublishMapper which it will call to convert the domain object into the NodeSeq for adding to the publish package:

implicit def objectToPublishXml[T](t: T)(implicit m: PublishMapper[T]): NodeSeq = m.formatForPublish(t)

Let's talk about this one in more detail. It's a parameterised function marked as being implicit so we can convert from any T to a NodeSeq, provided that there is a PublishAdapter instance available that is also of the parameterised type T. If all the conditions are met then the formatForPublish method is called on the mapper and the result returned.

Now that we have this in place, we can just write the following code:

val somePage = new Page
val someStory = new Story

val p = new PublishPackage
p + somePage
p + someStory

So, now we don't have to modify any existing code when we want to support a new publish type, we just implement a new PublishMapper for it and the compiler will notice that the new implicit conversion is available. Additionally, we are now fully typesafe as the compiler will error if we try to add add any domain objects to the publish package for which there is no publish mapper conversion.

As we have hopefully seen, Scala Type Classes are an elegant solution to a very common problem which gives us continued type safety for a minimal amount of extra code. In fact, I'd argue that the tiny amount of Scala complexity here saves way more code which we would have to write if we were doing class based dispatch and runtime type checking.

Friday 8 October 2010

Scala Lift Off London - Day 2

Day 2 of Scala Lift Off London started with some quick-fire presentations on a number of interesting subjects:

OSGi and Scala
Lifty
Minesweeper Problem

Lifty especially looks like a great tool for faster creation of Lift apps using SBT. Certainly something I'm going to be installing.

Scala Test Frameworks and DSLs

The first session I attended was an group discussion about some of the test frameworks for Scala. We looked at both Specs and ScalaTest and compared their approaches and the differences in their DSL offerings. We then looked at ScalaCheck as an alternative solution for testing. Especially useful when you need to generate a sequence of values to test against.

Finally we took DSLs even further by looking at test narratives for making tests clearer. For example you can write direct in the test case:

def test() = {

Alive cell dies when there are 3 neighbours

}

Is implemented by a DSL such that Scala converts it into the following series of chained method calls:

Alive.cell(dies).when(there).are(3).neighbours

A very interesting approach.

Monds, Functional Concepts, Type Concepts

I decided to have a go at hosting my own session to try to get some clarifications on a number of topics that have been bugging me for a while. I'm coming from a Java background and thus the functional elements of Scala along with its much more complex type system are two areas where I have had the most difficult transformation.

The aim of the session was to get those people who wanted clarifications on topics to post them on a whiteboard and then get members from the audience who were experienced in these areas to explain what they mean and how to use them in 5-10 minutes each. The session started well with some good audience participation. After a while it then dropped down to just Kevin Wright and Jon Pretty doing most of the work (thanks guys!).

During the hour (and the extra bit that we ran over into lunchtime) we covered:

Monads
Partial Functions
Currying and Partially Applied Functions
Monoids and Type Classes
Implicit Conversions
Self Types
Covariance and Contravariance

I learnt a huge amount from this session and a lot of the stuff I was only partially clear on now makes much more sense. I'm pleased I proposed the session as it was exactly what I was looking for.

Minesweeper

The third session of the day was a great talk by Kevin Wright (@thecoda) about implementing the Minesweeper game in Scala and Lift. An excellent example of how to write really great Scala and provide an elegant solution to the problem. Very interesting to see how this was done with only immutable structures and how ajax and Lift were combined in the front-end implementation.

I'd certainly recommend visiting Kevin's github page to explore this code in more detail.

Type Classes and the Cake Pattern

Another great presentation, this time by Jon Pretty. The first part was a more detail look at Type Classes and the Monoid pattern. This got pretty in-depth, but Jon explained it well. I've since built up an example of using Type Classes to solve a real world problem that I encountered. I'll post a description next week.

The second part of the session was the Cake pattern. This is a great approach to building componentised Scala applications as an alternative to dependency injection solutions like Spring or Guice. I'm already very familiar with the Cake pattern as I've done a fair bit of experimenting with it on my 'discala' experimental project. I'm going to follow up with Jon to perhaps explore some ideas in more detail.

Overall I'd have to say that Scala Lift Off was a great two days. There's a very vibrant Scala community in London and across Europe. I learnt lots and was very privileged to be in the same room as some very talented people. Also a big mention to SkillsMatter for hosting such a great event and providing some fantastic facilities. I'm looking forward to next year already!

Thursday 7 October 2010

Scala Lift Off - Day 1

I've been at Scala Lift Off London today (#scalalol). Had a great time and attended talks by some really knowledgable people. Also met some people doing some really great suff with Scala. The 'unconference' format took some getting used to at the start, but the group created a great agenda and it all came together nicely. Here's my take on today's talks and some ideas that I might propose for tomorrow.

Session 1 - Moving from Java to Scala
An interesting discussion of how a company moved from building apps in Java to their first Scala deployment. An interesting study on how you can move along incrementally, bringing the whole team along as you go. Their progress can be roughly described as:

Start writing unit tests for Java code in Scala
Use Scala for main code, but still using Java coding styles (i.e. Java without the semi-colons, getters/setters and boilerplate)
Starting to make use of Scala features to improve code (e.g. Option instead of null)
Using more of the Scala language features (e.g. closures, immutable types etc.)
Adopt the additional functional aspects of the language (e.g. map/flatMap)

What I thought was very interesting of their approach of not moving on to the next level of Scala until all of the team were up to speed with the current level. Also, their approach of getting the whole team on board in terms of style was excellent.

This certainly proves it is possible to move a team of Java programmers to the Scala language. I'd certainly love the opportunity to take one of the teams that I work with on this journey on a project over the space of 6-9 months.

Session 2 - Akka
A fantastic session with Jonas Boner (@jboner) presenting the main elements of the Akka framework. The actors model for building scalable concurrent applications really excites me. I can see many cases where things that I'm working on now could be made much simpler, but at the same time more scalable and performant by adopting this model.

I can certainly see some major architectural changes in the future for building large systems. Applications composed of thousands of lightweight actors, all working concurrently and passing messages around has got to be better than the current model of a single thread handling a single request and trying to deal with all the synchronisation issues whenever shared state is involved.

Also very impressed with the supervisor model within Akka, where a supervising actor can manage a number of other actors and control how they get restarted in failure cases and so forth. Also, the distributed actor model looks great and is almost seamless (although deployment might be more challenging!)

I'm certainly looking forward to building some actor based applications. I'll post details as I do.

Session 3 - Experience of using Scala and Akka
Although this was an excellent session, it was probably the least valuable one of my day. It was great to hear about a successful project that used Scala and the Akka framework - especially how quickly and easily it all came together. However, not much new information.

Session 4 - Software Transactional Memory (STM)
Another great session with Jonas Boner, looking at the STM support within Akka. I've played a bit with STM in the past, but combining this with the actor model looks very powerful. It's all very new at the moment, but I can see some great applications for this approach. The session also turned into a great discussion of STM and I learnt a lot of good stuff.

Other Thoughts
Martin Odersky (@odersky) did what looked like a great talk on advanced Scala and implicit conversions. Sadly I missed this as I was in the STM session. Looking forward to watching the video on this one.

Today for me was definitely an Akka day. However, a lot of the other sessions seemed to be on quite advanced or specialised topics. I'm hoping tomorrow to focus more on Scala and Lift, with perhaps some more intermediate and generic sessions. Some suggestions I might propose:

WTF is a Monad (and other Functional Goodness): I'm coming from a Java background and while I get the functional approach in Scala I still find it fairly challenging to work with. Some real world examples would help
The Scala Type System
Building Lift Applications for developers familiar with Java frameworks (e.g. Wicket, Tapestry, JSP, JSF)

Looking forward to tomorrow.

Thursday 30 September 2010

Simple and Elegant Property Model

In this post I talked about the need to create simple and elegant solutions to software problems rather than over engineer. I used as example a property mechanism that I have been working on for my new product. Having now got this to a state where I believe it is both simple and elegant I thought I'd share the evolution of the Scala code as a way of exploring this topic in more detail.

The goal of this particular set of classes is to provide a mechanism whereby a (super)user can define definitions for the properties that can be set against an item. The user creating the item then fills in values for these properties. Both users will be utilising a web interface for defining and setting properties. In addition properties may be mandatory and optional and can support default values.

Please note that all the code below was developed iteratively using a Test-Drive Development (TDD) approach. I'm not going to show the tests here, so please be aware that every iteration through the model was first backed by a test to define the required behaviour.

So, after my initial attempt at a solution that became far too complex and over engineered I rolled back to the simplest thing that I could possibly write:

case class PropertyDefinition(name: String)

case class Property(name: String, value: String)

object Property {
  def apply(definition: PropertyDefinition, value: String) = Property(definition.name, value)
}

This simple starting point meets the minimum requirements of a definition of a property and then a property that maps a name to a string value. While this is indeed simple, it's too simple in that it doesn't meet all the requirements and it's also not particularly elegant in the way it restricts values to be strings.

The next step was to add support for mandatory/optional functionality. The property definition was easy to change:

case class PropertyDefinition(name: String, mandatory: Boolean = false)

However, the property concept is more challenging as it is now possible to have properties with no value associated. With the general Scala goal of avoiding using null to represent no value, I instead modified property to hold and Option[String] instead:

case class Property(name: String, value: Option[String])

object Property {
  def apply(definition: PropertyDefinition, value: Option[String]) = value match {
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}

This is still a pretty simple solution. However it's still missing the requirement for default value support and it now seems less elegant because instead of creating a property with

Property(nameDefinition, "value")

I have to use

Property(nameDefinition, Some("value"))

Also, I'm still linked tightly to String value types. First I decided to deal with the default value requirement:

case class PropertyDefinition(name: String, mandatory: Boolean = false, defaultValue: Option[String] = None)

case class Property(name: String, value: Option[PropertyValue])

object Property {
  def apply(definition: PropertyDefinition, value: Option[String]) = value match {
    case None if definition.defaultValue != None => new Property(definition.name, definition.defaultValue)
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}

I now have the simplest solution that meets all my requirements. Many developers would stop at this point, but I still feed uncomfortable that the code is not particularly elegant. I don't want to over engineer additional complexity but I'd like to have a cleaner interface to the classes that I have created so far and I'd also like to leave more scope for future changes where possible.

My first improvement was to pull out the tight coupling to a String and instead introduce a PropertyValue class as a wrapper around a string:

case class PropertyValue(content: String) 

case class PropertyDefinition(name: String, mandatory: Boolean = false, defaultValue: Option[PropertyValue] = None)

case class Property(name: String, value: Option[PropertyValue])

object Property {
  def apply(definition: PropertyDefinition, value: Option[PropertyValue]) = value match {
    case None if definition.defaultValue != None => new Property(definition.name, definition.defaultValue)
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}

Now my property code is not fixed to a string, although my initial implementation of property value is still string based. This has the benefit of keeping the initial version simple but providing scope for someone to refactor the implementation of property value in the future to support different types or other requirements.

So, one element of elegance of the code is improved. Unfortunately it comes at the expense of another. To create a property you now have to type:

Property(nameDefintion, Some(PropertyValue("value"))

Yuk! I need a way to easily build an Option[PropertyValue] from a String in order to avoid all the boiler plate code. This is where Scala implicit conversions come to the rescue. I think their overuse can create code which can be very difficult to debug as you are never certain what implicit conversions are being applied. However in this case they are perfect for bringing the final level of elegance to the code:

object PropertyValue {
  implicit def wrapString(content: String) = content match {
    case "" => None
    case _ => Some(new PropertyValue(content))
  }
  implicit def unwrapString(value: Option[PropertyValue]) = value match {
    case Some(v) => v.content
    case None => ""
  }
}

Given these implicit conversions I can now type:

val property = Property(nameDefinition, "value')

val content: String = property.value

This ticks all my boxes for simplicity and elegance. The code is clean and simple; all the requirements are met. The interface to the classes is also clean and intuitive. However, the solution is also elegant in that it provides ways for future extension without adding excessive additional complexity. Future developers should be able to enhance this code to support more features (such as validation or type safe values) without having to rewrite it from scratch. Time to move on to the rest of my domain model...

Wednesday 29 September 2010

Simple and Elegant

One of the most common traits that I come across in us software developers is the tendency to over-engineer our code. We try to future-proof our code and we try to make it deal with scenarios that may never even be a requirement for the system we are building. Recognising when this is happening and being able to create a simpler and more elegant solution is something we should all be striving towards.

As an example, I'm currently using some of my personal time to build a solution for classifying large amounts of information (more about this in a future post). I'm currently undertaking a TDD exploration of the core domain model. One of the requirements that I have identified is the ability for a user to define a set of properties that can be held against each piece of information. Each property has a name that maps to a value - a fairly simple concept.

As I was writing the test cases and domain classes for this requirement I drifted down a route of allowing typed property values. Thus, rather than just String values, they could be Int, Boolean, ThingaMaBob or whatever you want them to be. However, definitions of properties and assignment of values are managed via a user interface rather than in compiled code so this solution started to grow very complex as it dropped into a world of runtime type validations and conversions and so forth.

As I wrote more and more code the solution began to smell more and more like over engineering (when you start needing Scala Manifests in a domain model you know something isn't quite right). I could see myself getting bogged down in getting this working, so I decided to take a step back and look at the requirement again. Was I sure that I would actually need typed property values? No, maybe in future but not for any specific cases I could currently identify. Was having typed property values allowing me to get a greater understanding of my domain model? No. Was the requirement for typed property values a valid one, or just an excuse for over engineering? The ultimate conclusion was that I could actually do everything I wanted for the foreseeable future with simple String values and some basic validation rules.

With my new found simplification in hand I dumped a whole load of complex code and ended up with a very simple, very elegant solution that works perfectly for my current needs.

Sadly, I all too often see developers go down this same route but fail to realise that they are over engineering. They then spend an excessive amount of time building a solution that is fragile, difficult to test and unresponsive to changes in requirements that might be different to those they originally imagined when building their solution. These over engineered solutions just contribute to the technical debt of projects that allow them to exist.

Always aim for the simplest and most elegant solution that meets the current set of requirements - nothing more, nothing less. Don't over engineer and don't try to predict future requirements. Note the emphasis on elegant: just because something is very simple doesn't mean it is a good solution. A simple solution can still be built from messy, poorly structured code that is just as difficult to maintain and enhance as its complex counterpart.

If we all focus on this goal of simple, elegant code then software becomes simpler, easier to understand and much easier to maintain and enhance. This then benefits us all through increased productivity and improved ability to deliver.

Tuesday 21 September 2010

The Elusive Inspiration

I'm currently working on a software idea that I have had running around in my head for a number of years. It's based around concepts for classification of information, exploration of relationships between this information and how this changes over time. The domain model is fairly complex as I want to keep it fairly generic and configurable so that individual users can customise it for their own types of information and classifications.

Unfortunately I have come up against a mental block when it comes to this domain model. I think that the ideas have been going round in my head for so long and I have so many different variations and alternatives that I just can't find a way to move forward with something concrete.

This has got me thinking about the mental process and how to unblock my creative self on this project. To this end I've been considering the following ideas...

Go Away and Do Something Else
Often I find that if I can't get my head around something then going off and doing a different project for a while allows my brain time to unwind and process my ideas in the background. After a while I then come back to my original project and more often than not this allows me make good progress. However, on this current project I've been going off and doing something else for ages and it's time I made some progress.

Go For a Thinking Walk
I love walking - especially going somewhere I haven't been before and just exploring. Often I find that during a walk of this type I am able to focus and concentrate on a single topic until I come to some form of resolution. I think I'll try this one at lunchtime today.

Whiteboard Session
I am a very visual person. I find exploring concepts on a whiteboard a great solution for fleshing out designs and examining alternatives. I love colours as well and often find a visual with different element types in different colours really clarifies my understanding. This will probably be my next approach to explore the results of my thinking walk.

Exploratory Coding
Sometimes even the best thinking and most perfect whiteboard drawings just don't capture the essence of a problem. At times like these a couple of hours of hacking out an exploratory solution in code can clarify all sorts thoughts and ideas. When doing this sort of coding I like to use a very dynamic language (Ruby is my general choice) as I find these allow me to capture lots of thinking in a short amount of time. Increasingly I'm also doing exploratory coding in a Functional language (usually Haskell) as I find this a great way to structure my thinking.

Test Driven Development
A completely alternative approach is to avoid all the up front thinking of how the domain model will work. Instead, define some test cases that describe what the domain model should do and then derive the model out of these tests. My final implementation will certainly be carried out in this way, but for the moment I'm still thinking at a higher level. I feel that I need to better understand exactly what I'm trying to achieve before writing production code.

How do you resolve mental block issues such as these? What other techniques are available?

Additions:

Build a Mind Map
Another thought that I had was to build a mind map to explore all of the thoughts and alternatives. By restricting this mind map to just the domain model that I am interested in I might be able to look at the problem in a new light. I use ConceptDraw MINDMAP on my Mac for this purpose.

Monday 13 September 2010

How much new technology?

As a software architect I regularly have to battle with the question of which and how many new technologies to introduce on any new projects that I'm working on. New projects (or ones offering a full architecture overhaul) are sufficiently rare that there are always numerous language, product or library developments since the last project. So, how do you decide what to include and what to exclude?

The 50% Rule

Over a number of years and many projects I have come to rely on a 50% rule for new technologies. Basically put, any new project can contain up to 50% new technologies, with the remainder being technologies that are well understood and that have been previously used by the majority of team members.

As an example, a number of years ago I started work on a greenfield project for a company that wanted to build some web-based administration tools for their existing (config file driven) applications. This was their first foray into Java and web development and I was given a team consisting of one senior developer and a number of junior devs with basic Java/Jsp training plus a bit of struts 1.x.

After some careful consideration we decided to introduce three major new technologies: Maven for builds, Hibernate for persistence and Spring Framework for the core application framework. We then decided to stick with proven technologies that the team had existing experience of for the remainder of the stack: MySQL as a database and Struts, JSP and JSTL for the presentation layer. Even though there was considerable motivation for using an alternative presentation stack (Tapestry, which I was using at the time or the early release of what would later become Apache Wicket - which I was a contributor to) the risk of a completely new technology stack, top to bottom, was just too large.

The advantage of taking this mixed approach to technology is that there was a lot less uncertainty on the project and the team is able to be immediately productive using the technologies they already know. The business is therefore more comfortable that progress is being made (rather than the team just playing with 'cool' stuff) and the team therefore has the leeway to learn how to get the most out of the new technologies and to correct any initial mistakes in how they are being used. I just don't feel that this would be the case if every technology being utilised was a new one.

However, sometimes a project is so radically different from anything that has gone before that you just can't find 50% existing technology to bring with you. Thus...

The Incremental Rule

Occasionally, a project departs so much from what has gone before that almost all the technology must be new. By definition this usually means technology that has not been used in a previous project, may not be familiar to the team and/or may never before have been put into production. For these projects some additional 'up-front' work is essential in order to de-risk the new technologies being used.

By way of an example, I am currently doing some initial explorations into an idea for a product for subject classification, relationship navigation and searching. This project will need to deal with very large data sets (> 100 million subjects) and need to support a high degree of concurrency and scalability. For this reason I'm looking into some particular technology choices for this application: Scala, Akka and Cassandra. Given the move to Scala as the main programming language I'm also planning on using the Simple Build Tool (SBT) and investigating the Lift web framework as well. I'm moving my version control from Subversion to Git. Finally, the application will be cloud based so I'm looking to automate deployment and scalability on AWS.

All of the above is a massive amount of new technology to take on board. Were I to just jump in and start the project I'm guaranteed that I would get into difficulty and end up with either major architecture problems or need to rewrite large parts of the application as I learn more about each technology. Instead. I've broken the technologies down into a number of smaller 'starter' projects so that I can get familiar with each in isolation and then build on more technologies gradually.

Starting out, I've built a couple of simple projects in Scala, using SBT, to get familiar with the language and to improve my Functional Programming knowledge. Next, I've started to add a web-based UI using the Lift framework. Once I'm happy with these I'm then going to build something using Scala, Akka and Cassandra. Thus, by the time I actually start building the final solution, the only unknown technology will be automated deployment to AWS (and even this is not a total unknown as I've manually deployed to AWS before).

Building something with so much new technology is always a big risk. But, by taking an incremental approach at de-risking each technology I can manage the risk within acceptable levels and ensure that my final solution is not crippled by lack of knowledge of my chosen technologies.

This then follows on to my final rule....

The Build a 'Strawman' Rule

Regardless of how you select the technologies that will be included in the project, the first iteration should always be to build an end-to-end 'strawman' architecture. This strawman should include all of the selected technologies playing nicely together. Builds should be automated and initial automated test suites (at all levels) should be up and running. Finally, any automated deployment should be in place. The strawman doesn't need to do much functionally, but if it contains enough processing to also allow some basic scalability and performance tests as well then even better.

By selecting just the right blend of new and existing technologies and spending time de-risking when there are many new technologies we can ensure that we always start a project with confidence that we can make the technologies work for us. Then, by starting with an architectural 'strawman' we further ensure that the technologies work together and eliminate the huge integration risk that we would otherwise hit late in the project when it might be too late to resolve it.