Archive for the ‘Software’ Category.

Goannas Eat Bugs

After my PhD, I worked in industry on the verification and development of software for a safety-critical environmental control system.  The project used a variety of tools and processes to improve and demonstrate product quality.  However, the only static analysis tool being used was lint.  I thought there had to be something better.

As an intern at SQI I had worked one summer hacking Prolog to extend PASS-C, an extensible source code static analysis tool.  However, most of the checks in that tool were about programming “style”.  From academia, I knew that model checking had potential to support more powerful semantic checks.  Model checking was increasingly being used for high-level system and hardware verification, but I wanted a software model checking tool for C and assembly source code.

The most promising tool I could find at the time was PREfast.  Frustratingly, at the time I was looking, PREfast had been acquired by Microsoft to be used by them internally, and had been taken off the market!  There was much gnashing of teeth.  (Microsoft has more recently made it available again.)   PREfast did deliver more powerful analyses, but it didn’t use model checking per se.

Now the tool I had been looking for is finally available: Goanna.  It delivers many of the powerful and precise analyses I had dreamed of, and works for C, C++, and (unusually) embedded assembly.  Goanna comes out of years of research and development at NICTA.  It is packaged as an Eclipse plugin, and is available as a free trial.

Fractal V Lifecycle

At the drinks after Ivar Jacobson’s talk, I was speaking with a project manager from Honeywell who’s about to adopt a more agile development approach.  Honeywell is in the industrial automation business - they do systems engineering to deliver solutions for things like building automation and factory process control.  Their business context is one of the most challenging for agile methodologies:

  • Mature technology/problem space where customers can accurately define most of their requirements up-front.
  • High integrity systems that have regulatory demands for “heavy” process documentation to provide assurance.
  • Many/large development teams.
  • Large systems where no single customer can represent all stakeholders.
  • Customers who don’t have time to be “on site” with the development team full-time during development.
  • Customers who want (or who are required!) to only sign contracts where they know what they’ll be getting.

The last four are standard problems for agile.  Ivar had actually discussed how agile approaches don’t fit some business conditions, but that nonetheless you can often adopt “30%” of the agile practices.  One of these practices is iterative development: it’s certainly been popularized in recent times by agile methodologists, but it’s not unique to agile methodologies.

Ivar presented the waterfall lifecycle as a strawman “unsmart practice” in his talk.  I was surprised by the number of hands that went up in the audience when he asked how many people were using it!  However, Honeywell (like most systems engineering companies I know) doesn’t use the waterfall lifecycle - instead they use a “V” development lifecycle.  The V lifecycle is like the waterfall, but is bent upwards in the middle, with coding and unit testing down at the bottom (pointy end) of the V.   The well-known advantages of the V lifecycle are that it shows how testing lines up with planning at each level of design abstraction, and shows how you can progress test planning early in development, concurrently with design/coding.

V Lifecycle

Classic V Lifecycle, showing three levels here for illustrative purposes - test planning can progress as a parallel activity along the dotted lines

The Honeywell project manager had a problem - how could he do agile development in his business context?  I suggested that he adapt the V lifecycle he was already using.  The structure of the V lifecycle can easily support iterations. Normally people think of the V lifecycle as a “big V”, spanning an entire development project.   The first obvious way to have an iterative V lifecycle is to have lots of sequential “little Vs” - each cycle up and down could be done as short V sprints.

Big Iterations of the V Lifecycle

Naive whole-cycle iterations of the V Lifecycle

But that exposes the customer to each iteration, and the Honeywell project manager told me that his customers don’t want to run an acceptance testing and sign-off process every 2-4 weeks during development!  To address this, you can use a less obvious approach, which here I’m calling the Fractal V Lifecycle.

Let’s work our way there in small steps.   So to start with, consider that instead of iterating the whole V in each iteration, you can instead iterate some of the lower parts of the V more frequently.  So if you had two low level iterations, you’d have a W lifecycle!

W Lifecycle - a V with one low-level internal iteration

W Lifecycle! - a V Lifecycle with two low-level internal iterations

To generalize to the full Fractal V Lifecycle, you can see that it’s possible to have many internal iterations at various levels of design abstraction. giving you a (quasi-)fractal V.

Fractal V Lifecycle - iterate at various levels to suit your business, but keep your overall V lifecycle structure

Fractal V Lifecycle - the flexibility to iterate at various levels to suit your business, while keeping a lifecycle structure that accommodates traditional assurance processes

I call this “fractal” because it reminds me of the Sierpinski Triangle.

Sierpinski Triangle

Sierpinski Triangle

The Fractal V Lifecycle is really only quasi-fractal, because there’s only a finite few levels of recursion in a development lifecycle, and because the internal iterations don’t have to be regular or symmetric over the course of the development lifetime.

The Fractal V Lifecycle solves a problem - it lets you do iterative development when your customers only want to be involved at large infrequent milestones.   It gives you the flexibility to adapt your iterations to suit your business conditions and technical environment.   But it also retains the shape of the V, which lets you keep using your existing systems engineering disciplines to comply with customer/regulatory requirements for process assurance.

There’s one thing that the Fractal V Lifecycle doesn’t explicitly decide for you - when should your iterations finish?  This is a major difference between iterations in agile and traditional plan-driven methodologies: agile development approaches have time-boxed iterations (usually between 1 and 4 weeks long), but traditional development approaches have iterations defined by scope.  The Fractal V Lifecycle is consistent with either approach.

Augmented Reality is Here

The classic picture of augmented reality is having a large helmet on your head, wearing funny glasses, and carrying around massive laptop computers and batteries in a backpack.  Your view of the real world is continually overlayed with rich and complex digital information.

But I’ve started to see augmented reality appear in other ways. The first is Livescribe - a digitized pen (well, really a pen-with camera writing on digitised paper).  It’s quite amazing to see that your hand-written scratchings can be uploaded and then searched electronically on your computer. But even more powerful is that the pen can also record audio, and automatically index those audio segments by the scratchings on the paper.  So it’s great for meetings - you can take notes just like normal, but if you forget what a particular point meant, you can instantly replay the audio from the same time when you jotted it down.

Note: I think users will have to be very cautious about the management of audio recorded during meetings, under the constraints provided by the NSW Surveillance Devices Act 2007, and similar legistlation elsewhere.

So Livescribe is a kind of augmented reality - here you can think of “reality” as being the written notes on your paper, and/or the sounds made that got recorded.  The “augmentation” for written notes is the audio stream, and vice-versa.  The written notes are also augmented with a search capability when they’re uploaded to your computer.

The second piece of augmented reality technology is a little more “traditional”, but even in its prototype form it’s been cut back to be lightweight and inexpensive.  It’s work out of the MIT media lab, built from a mobile phone, portable projector, and webcam - all commercially-available “off the shelf” components.  See this TED talk for a glimpse.

A quote from William Gibson seems appropriate in this context:

The future is here, it’s just not evenly distributed yet.

Launch of an Enterprise Management Forum

On Monday I saw an entertaining and thought-provoking talk in Sydney by Ivar Jacobson, one of the inventors of UML.  The talk was about agile development - what they don’t teach you in school.  Ivar discussed the eternal problem of developing good software, quickly, at low cost.   There’s a convergence happening between software engineering and enterprise technology, and so the talk was a good way to gather a crowd as part of a soft launch of the Alinement Network - a new online community for enterprise management technology practitioners.

Enterprise management is key for most complex businesses.  The technology powering the operations and management of the enterprise is a big market, and the vendors are good at building communities to support their users.  But surprisingly there aren’t many places on the net where people can discuss the discipline, theories, and technologies in a vendor-neutral space.

So, the Alinement Network could be an interesting space.  The guy behind it is Louis Taborda, a Sydney-based practitioner with a long history working in the enterprise architecture and application lifecycle management tools market. I first met Louis when he was finishing his PhD at MGSM - on configuration management and change management for enterprise systems.  Change (enabled by what Ivar called software “extensibility”) is at the heart of most of the hard problems in enterprise management technology.

Parsing CSV Files in F#

Work has presented me with a small data manipulation exercise. That’s another opportunity to do some more scripting in F#!

This time I’m processing some Comma-Separated Value (CSV) files. CSV files are one of the lowest forms of semi-structured data, used for representing a simple table of data textually. The basic idea is easy - values with commas between them - so CSV files are widely used. You might think parsing them is trivial. It can be like that if you’re lucky, but sometimes the values can contain commas, so then often the values get quoted, but then any quotes in the values have to be escaped. There are many variations on the “basic” idea.

So, writing a “quick CSV parser” can lead you into a maze of twisty little passages. You don’t want to pull out lex and yacc and roll your own full-blown grammar parser, because the whole point of CSV files is they’re supposed to be lightweight and easy!

Next time you need to write a CSV parser, don’t! You don’t have to reinvent the wheel - other people have already written well-tested libraries you can use.  I’ve been using the open source .net FileHelpers library in my F# scripting exercise. (I tried the jet ADO adapter first, but got a strange hard crash I couldn’t be bothered to debug. Anyway..)

It’s easy to use FileHelpers from F#. Here’s how, transliterating the example from the FileHelpers site.  Let’s say this is the file “FileIn.txt“:

1732,Juan Perez,435.00,11-05-2002
554,Pedro Gomez,12342.30,06-02-2004
112,Ramiro Politti,0.00,01-02-2000
924,Pablo Ramirez,3321.30,24-11-2002

First, define a (typed) class to represent a row in the CSV file. For the example, the F# type definition might look like this:

[< DelimitedRecord(",") >]
type Customer =
    class
        val CustId : int
        val Name : string
        val Balance : decimal
        [< FieldConverter(ConverterKind.Date, "dd-MM-yyyy") >]
        val AddedDate : DateTime
    end

Note the use of attributes above (in [< ... >] brackets). These are annotations that are carried into the compiled code, and can be accessed later by other tools using reflection. The attributes on the type above (e.g. DelimitedRecord) control how FileHelpers treats the overall representation of the file, and attributes on each of the fields (e.g. FieldConverter) are used to control the treatment of values in the corresponding columns in the file.
Create a parsing engine based on the type, like so:

let engine = new FileHelperEngine(typeof<Customer>)

and then you’re good to go:

let res = engine2.ReadFile("FileIn.txt")

Actually, there is a wrinkle here.  res is an obj array, but you’d prefer it to be a Customer array.  You can’t use the ordinary F# dynamic downcast directly, because the array isn’t a super-type itself (its type parameter is, here).  So you need to write and use an auxiliary type-casting function, like this:

let downcast_Customer_Array = Array.map (fun (a:obj) -> a :?> Customer)
let res_Customers = downcast_Customer_Array res

You end up with an array of your values in your newly defined type, which you can use in the ordinary way, e.g. the date for the first customer is:

res_Customers.[0].AddedDate

Easy, huh? Much easier than writing your own parser.

FileHelpers has a few other tricks if you need them.  I’ve been using extra converter attributes to tell FileHelpers that some fields are quoted, and to help parse my dates.  I’ve also been using a custom converter to parse a value which was itself a comma-separated list of values.  (The only wrinkle there was not being able to use F# lists as .net objects - I had to go via ResizeArray objects instead.)

A Medium Communicates with the Spirit of Blogging

The discussion earlier this year about the death of the blogosphere is surely exaggerated.  OK, my blog was “resting” for most of this year. My excuse is general busy-ness - moving house, moving office, and changing roles at NICTA. But recently I had the enthusiasm and time to write a flurry of blog entries. (I’ve had more time during my newly extended train commute, but notwithstanding that, I can see my blogging enthusiasm comes in bursts…)

Isn’t this how most of the blogosphere works? Most bloggers are amateurs, writing about their family, pets, or hobbies. The glamorous fantasy of blogging driving democratic journalism and incisive public commentary is true, but it’s only ever been true for only a tiny part of the whole.

It takes a certain perverse commitment to blog regularly if you’re not being paid for it. Of course increasingly, some bloggers do get paid for it - either as journalists, company employees or, for an elite influential few, through significant online ad revenue. But the fact that some people are paid for it doesn’t significantly affect the cost or value of blogging for the mass of amateurs.

So Nic Carr’s wrong to say that blogging “outside the bounds of the traditional media is gone” - the blogosphere is not dead. Yes, as the Economist says “Blogging has entered the mainstream” – blogging is now accepted a part of the spectrum of modern media. But non-mainstream blogging hasn’t died. There are still plenty of blogs about family, pets, hobbies, and there are still individuals reporting and providing independent social commentary.

Ironically, at the same time as blogging technology is becoming accepted by the mainstream media, it’s also becoming accepted for other more industrial purposes. Blogging technology is no longer just for blogging - the formats that support blogs (RSS, Atom, etc) are used as a REST representation for representing  time-series content, including mundane things such as home loan product announcements.

It’s certainly not dead, but both socially and technically, blogging is growing and adapting. It used to be the message, now increasingly it’s the medium.

ICSOC Day 3 Keynote - Infrastructure as a Service

I had to miss the second day of ICSOC, but was back for the morning of the third, and another great keynote, on Web-Scale Computing, from Peter Vosshall – a VP and Distinguished Engineer at Amazon. Amazon needs a highly reliable and scalable infrastructure internally to run its retail business, but has also been selling web services infrastructure to third parties. Peter spoke about EC2 (compute), SQS (messaging), S3 (storage of blobs with metadata), SimpleDB (storage of lightly-structured data with indexed queries), and EBS (storage for EC2 when you need a traditional filesystem or database).

As an example of how companies are using and benefitting from these services, he talked about a company called Animoto.  On their website you can upload a song and some photos, and they automatically build a video montage, matching transitions to beats.  They started with around 5000 customers in total, but after they built a facebook app and got some viral awareness, they shot up to 5000 to 10000 users per hour. They had deployed on EC2 and ramped up to 3500 - 5000 instances.  It looked like a neat story.

The business benefits of using the web services are having a capability for fast incremental infrastructure growth, and turning what would have been fixed capital expenses into variable operating expenses.  (Coincidentally I had also mentioned this latter benefit of web services in a podcasted interview I was in on Monday.)

As well as supplying web services, Amazon’s using them internally too.  Peter briefly reviewed how Amazon started as what looked like a 2-tier+web client-server web-application, but then refactored that incrementally (and painfully over 2002-2003) into a collection of services. They’ve seen reliability benefits – he said they can lose an entire data centre with no impact on the customer experience.  They’ve also had product management benefits – each service maintains its own data and operating responsibility, which lets them each evolve at their own pace. Amazon’s key NFPs are security, incremental scalability, availability (systems fail not by stopping, and failures aren’t independent), performance (not just mean performance but also performance in outlying cases), and cost-effectiveness.

Intriguingly, despite the claimed product management benefits, he said that 70% of development time was spent on “undifferentiated heavy lifting” delivering updated services – dealing with non-functional, administrative service management issues.  So only 30% of their effort is spent improving the customer experience. I think their delivered experience could certainly use some extra work, especially for their non-US customers!

ICSOC Day 1 Keynote - Services for Science

The 6th International Conference on Service Oriented Computing is on in Sydney this week. NICTA is a sponsor, and I managed to score a registration to attend.  Ian Foster opened with an interesting keynote. (Preceded by a 30 minute delay fussing with Mac technology issues!)  He spoke on “Services for Science” - how SOA is being used to support knowledge creation in science. Currently there’s a surprisingly strong growth of online services providing data and analysis, in astronomy and especially in the biomedical field.  He talked about the caGrid network. Ontologies are key there for meta-data of experimental results - Ian commented that the community is very “neat” (not scruffy) in being explicit and standardised in the representation and organisation of their data.

It’s interesting that for representing scientific workflow they’ve dropped BPEL in favour of the workflow notation and supporting infrastructure in Taverna. The workflows are used not only to coordinate data and analyses, but also to communicate methods and in principle to promote reuse. But the caGrid leaders recognise that it’s hard to design for workflow reuse, and hard to achieve reuse in practice.  Ian also discussed experimental use of functional programming techniques to support provenance - to capture computations as a first class entity for scientific audit, review, and mining. He finished with some discussion of scalability and text mining of research publications.

I think there are interesting analogues of some of the issues now being explored in the e-science domain that have already been thrashed out in software engineering. They are quite similar in some ways - in the two fields of practice at an industrial scale, there are teams of knowledge workers working on complex and partly-shared electronic assets. Large scale reuse and variation has been made methodical in Software Product Line Engineering, and provenance issues are very similar to those that are well known in the established discipline of (Software) Configuration Management.

COAG Invests in a National Electronic Conveyancing System

COAG met on Saturday and decided to invest to implement a national approach to conveyancing - the National Electronic Conveyancing System (NECS).  Currently each of Australia’s eight states and territories has its own different system for dealing with the transfer of real estate.  You might not think that’s a big deal - after all, wherever you are, the house you buy is only going to be in one state!  Why does it matter to have a uniform national system?

At an abstract level from the public’s point of view, when you buy a house, there’s just a buyer, a seller, and a central land registry that maintains the “golden truth” about ownership under the standard Torrens system of title.  It’s a little more complicated than that because mortgages for housing loans are also registered with land registries.  So banks and non-bank lenders are normally involved too.  It’s more complicated than that, because there’s a whole raft of other auxiliary entities involved in title exchange, such as title search companies, lawyers, property valuers, and insurers providing related services.  The whole industry (banks, non-bank lenders, and the auxiliary service organisations) operate nationally.  Currently they need to implement and maintain systems to deal with the land registry systems in each of the eight states and territories.  In the past conveyancing has been a manual process, and human processors have been able to deal with the inefficiencies of working with multiple interfaces.

However, access to land registries is starting to move online, to reduce the cost and time of buying real-estate.  When conveyancing becomes automated, there’s a large initial cost borne by everyone in the industry to integrate with the new system(s).  Companies would prefer to pay this initial overhead cost once, not eight times!

NECS is intended to address this problem.  The goal is not to create a single national land registry, but instead to create a single national interface to all of the state and territory land registries. Organisations will be able to integrate with the national interface, and gain access to the land registries in every state.

Our group at NICTA has been working with NECS, looking at issues in the definition and management of business vocabularies, business rules, and business processes.  NICTA’s research philosophy is “use-inspired research” - working on fundamental scientific advances and technology innovations in the context of, and with an understanding of, real-world problems.   The goal is to do research that has more impact, and benefits Australia.  Our work with NECS is an example of all of this.   It’s still early days, but having a deep engagement with conveyancing and e-government has already been important to motivate and direct the research we’re doing.

Computer Science vs Software Engineering

My University education is in Computer Science, but by professional life and renewed research career is in Software Engineering.  A lot of people (and perhaps some University departments!) probably think these are just the same thing, with different names.  But in my transition to Software Engineering I’ve discovered they’re very different, and I think their difference is not all down to the the normal arguments about science vs engineering.

In Computer Science, the “unit of analysis” is the procedure (in the sense of effective procedure, but I also mean to include non-terminating processes).  Entities of interest include algorithms and data-structures, interfaces, ADTs, types, and languages for expressing them.

In contrast, in Software Engineering, the unit of analysis is the whole software system.  Here the entities of interest include architectures, and system models. A whole software system is not just “bigger” in size than a single procedure/process.  It also has many more different kinds of functionality, many more developers, and many different users and other stakeholders.

There are a lot of common themes across Computer Science and Software Engineering.  For example, both are concerned with issues such as specification, construction, distribution, performance analysis, and verification.

The challenges for Software Engineering are not just dealing with the scale of the system, but also dealing with the scale of the development of the system. The challenges are not just technical, they’re also socio-technical.  So although Computer Science and Software Engineering both deal with software and have many common themes, their technologies and methodologies are usually quite different because they’re dealing with different kinds of entities in different contexts.

Computer Science and Software Engineering Software are very different disciplines.