Archive for the ‘Software’ Category.

Reflections at WICSA

WICSA was fun.  I usually find the most I can hope for in a conference is 1 or 2 papers that are really interesting, but I think WICSA cleared 5, so it was well worthwhile.  What I particularly enjoy about conferences is hearing how people verbally describe the ideas and challenges in the field.  You can get so much more nuance and emphasis from hearing people talk about their research, compared to just reading papers.

A great example was the final keynote for the conference, by Alexander Wolf.  He covered reflections on his personal history working in software architecture, but as one of the “fathers” of the field, his talk was also a history of early software architecture research.  It was fun to play spot the co-authors in the audience and also among other acquaintances.

He talked about the importance of simulation and experimentation for architecture, and called for more work to be done in the area.  At NICTA, Jenny Liu and Paul Brebner have been leading work in these areas, particularly for performance analysis of enterprise architectures.  They’ve been getting huge interest from industry.  It’s a very promising approach and I can support the observation that simulation and experimentation are critically important to the discipline of software architecture.

Alexander Wolf was also previously involved with Software Configuration Management research, which is an interest of mine.  He didn’t really elaborate on that line of work, but he did mention a paper of his discussing the relatedness of software architecture and configuration management.  I think there’s still a lot more that can be said in this area, particularly concerning architecture evolution.

The BASE of CREST

Yesterday WICSA 2009 finished. There were a number of interesting talks over the three days of the conference. One was by Richard Taylor on Architectural Styles for Runtime Software Adaptation. He was discussing a framework (BASE) for comparing approaches to dynamic runtime adaptation. The model classifies how various architectural styles deal with Behavior, Asynchrony, State, and Execution Context for adaptation.

One of the frameworks being analysed with BASE was CREST – Computational REST. In CREST, pieces of computation are represented as URLs and can be moved around the web just as static content is. Richard gave a demo of CREST in action – showing pieces of independent computation running and serving dynamic content to multiple distributed browsers. It certainly had a “wow” factor. It reminded me very strongly of the Google Wave demo. But CREST is a more general architecture – it’s not committed to the threaded content model that’s deeply built into Google Wave. Could you reimplement Google Wave on top of the CREST framework? It looks plausible, and it might also help you create and share a much richer variety of dynamic content – to put yourself ahead of the Wave (pun intended).

I had a few questions (some of which were prompted by discussions with Liming Zhu) but I didn’t get a chance to pin down Richard after the talk…

The first question is about CREST, but not BASE. We can observe that REST is “broken” on the web. For example, cookies aren’t part of (and violate!) the REST principles, but they are nonetheless essential to the workings of the Web. That’s fine – pragmatics will almost always get in the way of a naive realisation of an abstract model. So my question is – how (or if?) does CREST need to be “broken” for it to be workable?

My second set of questions is about the BASE framework discussed in the paper. What limitations do the various architectural styles carry on the scope of adaptation? How do you get assurance about invariant functionality? Why doesn’t BASE consider security? Dynamic adaptation is great, but not everything will be dynamically variable, and you probably want to know that some functionality won’t vary, and won’t be subverted at runtime. How do the various architectural styles enable that?

The Next Big Thing?

How can you tell what the next big thing is going to be? Google’s pagerank algorithm will tell you what web pages have been important enough in the past for other people to have linked to.   Google trends will tell you what search terms people have been using recently, again in the past. What about the future?

Some predictions about the future are doomed to failure.   For example, Popper’s Poverty of Historicism is largely about the futility of predicting future society. However, some aspects of the future are largely predictable – science and technology work because they accurately predict the behavior of the physical world.  There’s a large middle ground of futures that aren’t easy to predict.  Prediction markets have been proposed as a way of getting better-than-chance predictions of these events.

Ricky Robinson at NICTA has recently launched citemine – a prediction market for academic papers.  The predictions being made are about how much each paper will be cited by other papers.  Ironically for citemine, one of the poverties of historicism that Popper identifies is a poverty of imagination about the possibilities of the impact of future science and technology! (Still, I imagine that Popper’s criticism only applies to long-term predictions of the impacts of science on society, not the shorter-term predictions of the importance of recently published scientific papers.)

The benefit of citemine is that it can be a leading indicator of the quality of publications, whereas existing citation metrics are very lagging indicators of the quality of publications and researchers.   Ricky’s hope is that academics will care enough to trade in citemine to acquire its “Reals” which may become a widely recognised measure of academic reputation.  Your personal worth in Reals is a measure of two things: your ability to have written highly cited papers, and how much better you have been than others at spotting papers that will be highly cited.  You can tell how much of your worth is due to each different source.  (Interestingly, I think both of these are lagging indicators, despite that the market price of a publication is a leading indicator.)

Even if such a market could work well if universally adopted and in a steady state, it’s a challenge to launch it.   It’s a chicken and egg problem – activity is required to make the market function, but a functioning market is required to generate interest in being active in the market.  The market has to bootstrap Reals into having value in the real world somehow.

citemine is “very beta”, and there are certainly a few issues at the moment:

  • Some matches aren’t being made in the market – there are buyers and sellers at the same price who aren’t doing a deal.   (Looks like a bug?)
  • There’s currently very low market depth, especially among sellers.
  • There’s no sophisticated market overview mechanism – just a list of papers at their current prices.
  • There’s no market metrics for papers – e.g. historical returns, price volatility, etc.

Ricky’s paper explains citemine.  I have two queries, and two observations…

Is citemine a zero-sum game?  In citemine, Reals are given to shareholders as dividends based on citations, but those Reals come from the previously-paid cost of submitting the citing papers.  So it looks like a zero-sum game. In my limited understanding of the economics of real stock markets, value gets created through primary production and through productivity improvements in other sectors.   I don’t see how that happens in citemine. Which leads me to my next query…

Is citemine a pyramid/ponzi scheme?  In citemine, the only source of new Reals is from the registration of new users, whose initial allocation of Reals is used to submit papers to pay dividends for existing users.  This question is more stark because there’s no leverage in citemine (debt, shorts).  Maybe I’m just confusing value with liquidity.

My intuition is manuscripts in citemine will behave more like mining stocks than industrial stocks in real stock markets. (Is that why it’s called citemine? :-) )  Mines have a limited finite quantity of ore, and the value of the stocks for that mine decrease as the ore is removed from the mine.   The value of a manuscript in citemine derives from future citations, but for almost all scientific papers, there is a finite time horizon for possible citation.  At some point people lose interest in moderately influential papers and cite later derived works.  Even very influential papers become part of assumed/background knowledge and get cited less.  I think that in citemine, most manuscripts will trend to a near-zero market price.

Finally, there’s a “meta-gaming” anomaly currently at play in the citemine market. If it turns out to be a successful market, then Reals get real value, and Ricky’s citemine paper (and closely related papers by other authors) will also inevitably be highly cited.  If the market turns out to fade into obscurity, then the free Reals you get on joining stay as play money, so it doesn’t matter how you will have spent them.   Ricky’s paper (and related papers) are a safe bet – you can’t lose!  I would have bought some, but no one was selling – and I have no idea about how to pick a good price to offer!

Goannas Eat Bugs

After my PhD, I worked in industry on the verification and development of software for a safety-critical environmental control system.  The project used a variety of tools and processes to improve and demonstrate product quality.  However, the only static analysis tool being used was lint.  I thought there had to be something better.

As an intern at SQI I had worked one summer hacking Prolog to extend PASS-C, an extensible source code static analysis tool.  However, most of the checks in that tool were about programming “style”.  From academia, I knew that model checking had potential to support more powerful semantic checks.  Model checking was increasingly being used for high-level system and hardware verification, but I wanted a software model checking tool for C and assembly source code.

The most promising tool I could find at the time was PREfast.  Frustratingly, at the time I was looking, PREfast had been acquired by Microsoft to be used by them internally, and had been taken off the market!  There was much gnashing of teeth.  (Microsoft has more recently made it available again.)   PREfast did deliver more powerful analyses, but it didn’t use model checking per se.

Now the tool I had been looking for is finally available: Goanna.  It delivers many of the powerful and precise analyses I had dreamed of, and works for C, C++, and (unusually) embedded assembly.  Goanna comes out of years of research and development at NICTA.  It is packaged as an Eclipse plugin, and is available as a free trial.

Fractal V Lifecycle

At the drinks after Ivar Jacobson’s talk, I was speaking with a project manager from Honeywell who’s about to adopt a more agile development approach.  Honeywell is in the industrial automation business – they do systems engineering to deliver solutions for things like building automation and factory process control.  Their business context is one of the most challenging for agile methodologies:

  • Mature technology/problem space where customers can accurately define most of their requirements up-front.
  • High integrity systems that have regulatory demands for “heavy” process documentation to provide assurance.
  • Many/large development teams.
  • Large systems where no single customer can represent all stakeholders.
  • Customers who don’t have time to be “on site” with the development team full-time during development.
  • Customers who want (or who are required!) to only sign contracts where they know what they’ll be getting.

The last four are standard problems for agile.  Ivar had actually discussed how agile approaches don’t fit some business conditions, but that nonetheless you can often adopt “30%” of the agile practices.  One of these practices is iterative development: it’s certainly been popularized in recent times by agile methodologists, but it’s not unique to agile methodologies.

Ivar presented the waterfall lifecycle as a strawman “unsmart practice” in his talk.  I was surprised by the number of hands that went up in the audience when he asked how many people were using it!  However, Honeywell (like most systems engineering companies I know) doesn’t use the waterfall lifecycle – instead they use a “V” development lifecycle.  The V lifecycle is like the waterfall, but is bent upwards in the middle, with coding and unit testing down at the bottom (pointy end) of the V.   The well-known advantages of the V lifecycle are that it shows how testing lines up with planning at each level of design abstraction, and shows how you can progress test planning early in development, concurrently with design/coding.

V Lifecycle

Classic V Lifecycle, showing three levels here for illustrative purposes - test planning can progress as a parallel activity along the dotted lines

The Honeywell project manager had a problem – how could he do agile development in his business context?  I suggested that he adapt the V lifecycle he was already using.  The structure of the V lifecycle can easily support iterations. Normally people think of the V lifecycle as a “big V”, spanning an entire development project.   The first obvious way to have an iterative V lifecycle is to have lots of sequential “little Vs” – each cycle up and down could be done as short V sprints.

Big Iterations of the V Lifecycle

Naive whole-cycle iterations of the V Lifecycle

But that exposes the customer to each iteration, and the Honeywell project manager told me that his customers don’t want to run an acceptance testing and sign-off process every 2-4 weeks during development!  To address this, you can use a less obvious approach, which here I’m calling the Fractal V Lifecycle.

Let’s work our way there in small steps.   So to start with, consider that instead of iterating the whole V in each iteration, you can instead iterate some of the lower parts of the V more frequently.  So if you had two low level iterations, you’d have a W lifecycle!

W Lifecycle - a V with one low-level internal iteration

W Lifecycle! - a V Lifecycle with two low-level internal iterations

To generalize to the full Fractal V Lifecycle, you can see that it’s possible to have many internal iterations at various levels of design abstraction. giving you a (quasi-)fractal V.

Fractal V Lifecycle - iterate at various levels to suit your business, but keep your overall V lifecycle structure

Fractal V Lifecycle - the flexibility to iterate at various levels to suit your business, while keeping a lifecycle structure that accommodates traditional assurance processes

I call this “fractal” because it reminds me of the Sierpinski Triangle.

Sierpinski Triangle

Sierpinski Triangle

The Fractal V Lifecycle is really only quasi-fractal, because there’s only a finite few levels of recursion in a development lifecycle, and because the internal iterations don’t have to be regular or symmetric over the course of the development lifetime.

The Fractal V Lifecycle solves a problem – it lets you do iterative development when your customers only want to be involved at large infrequent milestones.   It gives you the flexibility to adapt your iterations to suit your business conditions and technical environment.   But it also retains the shape of the V, which lets you keep using your existing systems engineering disciplines to comply with customer/regulatory requirements for process assurance.

There’s one thing that the Fractal V Lifecycle doesn’t explicitly decide for you – when should your iterations finish?  This is a major difference between iterations in agile and traditional plan-driven methodologies: agile development approaches have time-boxed iterations (usually between 1 and 4 weeks long), but traditional development approaches have iterations defined by scope.  The Fractal V Lifecycle is consistent with either approach.

Augmented Reality is Here

The classic picture of augmented reality is having a large helmet on your head, wearing funny glasses, and carrying around massive laptop computers and batteries in a backpack.  Your view of the real world is continually overlayed with rich and complex digital information.

But I’ve started to see augmented reality appear in other ways. The first is Livescribe – a digitized pen (well, really a pen-with camera writing on digitised paper).  It’s quite amazing to see that your hand-written scratchings can be uploaded and then searched electronically on your computer. But even more powerful is that the pen can also record audio, and automatically index those audio segments by the scratchings on the paper.  So it’s great for meetings – you can take notes just like normal, but if you forget what a particular point meant, you can instantly replay the audio from the same time when you jotted it down.

Note: I think users will have to be very cautious about the management of audio recorded during meetings, under the constraints provided by the NSW Surveillance Devices Act 2007, and similar legistlation elsewhere.

So Livescribe is a kind of augmented reality – here you can think of “reality” as being the written notes on your paper, and/or the sounds made that got recorded.  The “augmentation” for written notes is the audio stream, and vice-versa.  The written notes are also augmented with a search capability when they’re uploaded to your computer.

The second piece of augmented reality technology is a little more “traditional”, but even in its prototype form it’s been cut back to be lightweight and inexpensive.  It’s work out of the MIT media lab, built from a mobile phone, portable projector, and webcam – all commercially-available “off the shelf” components.  See this TED talk for a glimpse.

A quote from William Gibson seems appropriate in this context:

The future is here, it’s just not evenly distributed yet.

Launch of an Enterprise Management Forum

On Monday I saw an entertaining and thought-provoking talk in Sydney by Ivar Jacobson, one of the inventors of UML.  The talk was about agile development – what they don’t teach you in school.  Ivar discussed the eternal problem of developing good software, quickly, at low cost.   There’s a convergence happening between software engineering and enterprise technology, and so the talk was a good way to gather a crowd as part of a soft launch of the Alinement Network – a new online community for enterprise management technology practitioners.

Enterprise management is key for most complex businesses.  The technology powering the operations and management of the enterprise is a big market, and the vendors are good at building communities to support their users.  But surprisingly there aren’t many places on the net where people can discuss the discipline, theories, and technologies in a vendor-neutral space.

So, the Alinement Network could be an interesting space.  The guy behind it is Louis Taborda, a Sydney-based practitioner with a long history working in the enterprise architecture and application lifecycle management tools market. I first met Louis when he was finishing his PhD at MGSM – on configuration management and change management for enterprise systems.  Change (enabled by what Ivar called software “extensibility”) is at the heart of most of the hard problems in enterprise management technology.

Parsing CSV Files in F#

Work has presented me with a small data manipulation exercise. That’s another opportunity to do some more scripting in F#!

This time I’m processing some Comma-Separated Value (CSV) files. CSV files are one of the lowest forms of semi-structured data, used for representing a simple table of data textually. The basic idea is easy – values with commas between them – so CSV files are widely used. You might think parsing them is trivial. It can be like that if you’re lucky, but sometimes the values can contain commas, so then often the values get quoted, but then any quotes in the values have to be escaped. There are many variations on the “basic” idea.

So, writing a “quick CSV parser” can lead you into a maze of twisty little passages. You don’t want to pull out lex and yacc and roll your own full-blown grammar parser, because the whole point of CSV files is they’re supposed to be lightweight and easy!

Next time you need to write a CSV parser, don’t! You don’t have to reinvent the wheel – other people have already written well-tested libraries you can use.  I’ve been using the open source .net FileHelpers library in my F# scripting exercise. (I tried the jet ADO adapter first, but got a strange hard crash I couldn’t be bothered to debug. Anyway..)

It’s easy to use FileHelpers from F#. Here’s how, transliterating the example from the FileHelpers site.  Let’s say this is the file “FileIn.txt“:

1732,Juan Perez,435.00,11-05-2002
554,Pedro Gomez,12342.30,06-02-2004
112,Ramiro Politti,0.00,01-02-2000
924,Pablo Ramirez,3321.30,24-11-2002

First, define a (typed) class to represent a row in the CSV file. For the example, the F# type definition might look like this:

[< DelimitedRecord(",") >]
type Customer =
    class
        val CustId : int
        val Name : string
        val Balance : decimal
        [< FieldConverter(ConverterKind.Date, "dd-MM-yyyy") >]
        val AddedDate : DateTime
    end

Note the use of attributes above (in [< ... >] brackets). These are annotations that are carried into the compiled code, and can be accessed later by other tools using reflection. The attributes on the type above (e.g. DelimitedRecord) control how FileHelpers treats the overall representation of the file, and attributes on each of the fields (e.g. FieldConverter) are used to control the treatment of values in the corresponding columns in the file.
Create a parsing engine based on the type, like so:

let engine = new FileHelperEngine(typeof<Customer>)

and then you’re good to go:

let res = engine.ReadFile("FileIn.txt")

Actually, there is a wrinkle here.  res is an obj array, but you’d prefer it to be a Customer array.  You can’t use the ordinary F# dynamic downcast directly, because the array isn’t a super-type itself (its type parameter is, here).  So you need to write and use an auxiliary type-casting function, like this:

let downcast_Customer_Array = Array.map (fun (a:obj) -> a :?> Customer)
let res_Customers = downcast_Customer_Array res

You end up with an array of your values in your newly defined type, which you can use in the ordinary way, e.g. the date for the first customer is:

res_Customers.[0].AddedDate

Easy, huh? Much easier than writing your own parser.

FileHelpers has a few other tricks if you need them.  I’ve been using extra converter attributes to tell FileHelpers that some fields are quoted, and to help parse my dates.  I’ve also been using a custom converter to parse a value which was itself a comma-separated list of values.  (The only wrinkle there was not being able to use F# lists as .net objects – I had to go via ResizeArray objects instead.)

A Medium Communicates with the Spirit of Blogging

The discussion earlier this year about the death of the blogosphere is surely exaggerated.  OK, my blog was “resting” for most of this year. My excuse is general busy-ness – moving house, moving office, and changing roles at NICTA. But recently I had the enthusiasm and time to write a flurry of blog entries. (I’ve had more time during my newly extended train commute, but notwithstanding that, I can see my blogging enthusiasm comes in bursts…)

Isn’t this how most of the blogosphere works? Most bloggers are amateurs, writing about their family, pets, or hobbies. The glamorous fantasy of blogging driving democratic journalism and incisive public commentary is true, but it’s only ever been true for only a tiny part of the whole.

It takes a certain perverse commitment to blog regularly if you’re not being paid for it. Of course increasingly, some bloggers do get paid for it – either as journalists, company employees or, for an elite influential few, through significant online ad revenue. But the fact that some people are paid for it doesn’t significantly affect the cost or value of blogging for the mass of amateurs.

So Nic Carr’s wrong to say that blogging “outside the bounds of the traditional media is gone” – the blogosphere is not dead. Yes, as the Economist says “Blogging has entered the mainstream” – blogging is now accepted a part of the spectrum of modern media. But non-mainstream blogging hasn’t died. There are still plenty of blogs about family, pets, hobbies, and there are still individuals reporting and providing independent social commentary.

Ironically, at the same time as blogging technology is becoming accepted by the mainstream media, it’s also becoming accepted for other more industrial purposes. Blogging technology is no longer just for blogging – the formats that support blogs (RSS, Atom, etc) are used as a REST representation for representing  time-series content, including mundane things such as home loan product announcements.

It’s certainly not dead, but both socially and technically, blogging is growing and adapting. It used to be the message, now increasingly it’s the medium.

ICSOC Day 3 Keynote – Infrastructure as a Service

I had to miss the second day of ICSOC, but was back for the morning of the third, and another great keynote, on Web-Scale Computing, from Peter Vosshall – a VP and Distinguished Engineer at Amazon. Amazon needs a highly reliable and scalable infrastructure internally to run its retail business, but has also been selling web services infrastructure to third parties. Peter spoke about EC2 (compute), SQS (messaging), S3 (storage of blobs with metadata), SimpleDB (storage of lightly-structured data with indexed queries), and EBS (storage for EC2 when you need a traditional filesystem or database).

As an example of how companies are using and benefitting from these services, he talked about a company called Animoto.  On their website you can upload a song and some photos, and they automatically build a video montage, matching transitions to beats.  They started with around 5000 customers in total, but after they built a facebook app and got some viral awareness, they shot up to 5000 to 10000 users per hour. They had deployed on EC2 and ramped up to 3500 – 5000 instances.  It looked like a neat story.

The business benefits of using the web services are having a capability for fast incremental infrastructure growth, and turning what would have been fixed capital expenses into variable operating expenses.  (Coincidentally I had also mentioned this latter benefit of web services in a podcasted interview I was in on Monday.)

As well as supplying web services, Amazon’s using them internally too.  Peter briefly reviewed how Amazon started as what looked like a 2-tier+web client-server web-application, but then refactored that incrementally (and painfully over 2002-2003) into a collection of services. They’ve seen reliability benefits – he said they can lose an entire data centre with no impact on the customer experience.  They’ve also had product management benefits – each service maintains its own data and operating responsibility, which lets them each evolve at their own pace. Amazon’s key NFPs are security, incremental scalability, availability (systems fail not by stopping, and failures aren’t independent), performance (not just mean performance but also performance in outlying cases), and cost-effectiveness.

Intriguingly, despite the claimed product management benefits, he said that 70% of development time was spent on “undifferentiated heavy lifting” delivering updated services – dealing with non-functional, administrative service management issues.  So only 30% of their effort is spent improving the customer experience. I think their delivered experience could certainly use some extra work, especially for their non-US customers!