Archive for the ‘Software’ Category.

Augmented Reality is Here

The classic picture of augmented reality is having a large helmet on your head, wearing funny glasses, and carrying around massive laptop computers and batteries in a backpack.  Your view of the real world is continually overlayed with rich and complex digital information.

But I’ve started to see augmented reality appear in other ways. The first is Livescribe – a digitized pen (well, really a pen-with camera writing on digitised paper).  It’s quite amazing to see that your hand-written scratchings can be uploaded and then searched electronically on your computer. But even more powerful is that the pen can also record audio, and automatically index those audio segments by the scratchings on the paper.  So it’s great for meetings – you can take notes just like normal, but if you forget what a particular point meant, you can instantly replay the audio from the same time when you jotted it down.

Note: I think users will have to be very cautious about the management of audio recorded during meetings, under the constraints provided by the NSW Surveillance Devices Act 2007, and similar legistlation elsewhere.

So Livescribe is a kind of augmented reality – here you can think of “reality” as being the written notes on your paper, and/or the sounds made that got recorded.  The “augmentation” for written notes is the audio stream, and vice-versa.  The written notes are also augmented with a search capability when they’re uploaded to your computer.

The second piece of augmented reality technology is a little more “traditional”, but even in its prototype form it’s been cut back to be lightweight and inexpensive.  It’s work out of the MIT media lab, built from a mobile phone, portable projector, and webcam – all commercially-available “off the shelf” components.  See this TED talk for a glimpse.

A quote from William Gibson seems appropriate in this context:

The future is here, it’s just not evenly distributed yet.

Launch of an Enterprise Management Forum

On Monday I saw an entertaining and thought-provoking talk in Sydney by Ivar Jacobson, one of the inventors of UML.  The talk was about agile development – what they don’t teach you in school.  Ivar discussed the eternal problem of developing good software, quickly, at low cost.   There’s a convergence happening between software engineering and enterprise technology, and so the talk was a good way to gather a crowd as part of a soft launch of the Alinement Network – a new online community for enterprise management technology practitioners.

Enterprise management is key for most complex businesses.  The technology powering the operations and management of the enterprise is a big market, and the vendors are good at building communities to support their users.  But surprisingly there aren’t many places on the net where people can discuss the discipline, theories, and technologies in a vendor-neutral space.

So, the Alinement Network could be an interesting space.  The guy behind it is Louis Taborda, a Sydney-based practitioner with a long history working in the enterprise architecture and application lifecycle management tools market. I first met Louis when he was finishing his PhD at MGSM – on configuration management and change management for enterprise systems.  Change (enabled by what Ivar called software “extensibility”) is at the heart of most of the hard problems in enterprise management technology.

Parsing CSV Files in F#

Work has presented me with a small data manipulation exercise. That’s another opportunity to do some more scripting in F#!

This time I’m processing some Comma-Separated Value (CSV) files. CSV files are one of the lowest forms of semi-structured data, used for representing a simple table of data textually. The basic idea is easy – values with commas between them – so CSV files are widely used. You might think parsing them is trivial. It can be like that if you’re lucky, but sometimes the values can contain commas, so then often the values get quoted, but then any quotes in the values have to be escaped. There are many variations on the “basic” idea.

So, writing a “quick CSV parser” can lead you into a maze of twisty little passages. You don’t want to pull out lex and yacc and roll your own full-blown grammar parser, because the whole point of CSV files is they’re supposed to be lightweight and easy!

Next time you need to write a CSV parser, don’t! You don’t have to reinvent the wheel – other people have already written well-tested libraries you can use.  I’ve been using the open source .net FileHelpers library in my F# scripting exercise. (I tried the jet ADO adapter first, but got a strange hard crash I couldn’t be bothered to debug. Anyway..)

It’s easy to use FileHelpers from F#. Here’s how, transliterating the example from the FileHelpers site.  Let’s say this is the file “FileIn.txt“:

1732,Juan Perez,435.00,11-05-2002
554,Pedro Gomez,12342.30,06-02-2004
112,Ramiro Politti,0.00,01-02-2000
924,Pablo Ramirez,3321.30,24-11-2002

First, define a (typed) class to represent a row in the CSV file. For the example, the F# type definition might look like this:

[< DelimitedRecord(",") >]
type Customer =
    class
        val CustId : int
        val Name : string
        val Balance : decimal
        [< FieldConverter(ConverterKind.Date, "dd-MM-yyyy") >]
        val AddedDate : DateTime
    end

Note the use of attributes above (in [< ... >] brackets). These are annotations that are carried into the compiled code, and can be accessed later by other tools using reflection. The attributes on the type above (e.g. DelimitedRecord) control how FileHelpers treats the overall representation of the file, and attributes on each of the fields (e.g. FieldConverter) are used to control the treatment of values in the corresponding columns in the file.
Create a parsing engine based on the type, like so:

let engine = new FileHelperEngine(typeof<Customer>)

and then you’re good to go:

let res = engine2.ReadFile("FileIn.txt")

Actually, there is a wrinkle here.  res is an obj array, but you’d prefer it to be a Customer array.  You can’t use the ordinary F# dynamic downcast directly, because the array isn’t a super-type itself (its type parameter is, here).  So you need to write and use an auxiliary type-casting function, like this:

let downcast_Customer_Array = Array.map (fun (a:obj) -> a :? > Customer)
let res_Customers = downcast_Customer_Array res

You end up with an array of your values in your newly defined type, which you can use in the ordinary way, e.g. the date for the first customer is:

res_Customers.[0].AddedDate

Easy, huh? Much easier than writing your own parser.

FileHelpers has a few other tricks if you need them.  I’ve been using extra converter attributes to tell FileHelpers that some fields are quoted, and to help parse my dates.  I’ve also been using a custom converter to parse a value which was itself a comma-separated list of values.  (The only wrinkle there was not being able to use F# lists as .net objects – I had to go via ResizeArray objects instead.)

A Medium Communicates with the Spirit of Blogging

The discussion earlier this year about the death of the blogosphere is surely exaggerated.  OK, my blog was “resting” for most of this year. My excuse is general busy-ness – moving house, moving office, and changing roles at NICTA. But recently I had the enthusiasm and time to write a flurry of blog entries. (I’ve had more time during my newly extended train commute, but notwithstanding that, I can see my blogging enthusiasm comes in bursts…)

Isn’t this how most of the blogosphere works? Most bloggers are amateurs, writing about their family, pets, or hobbies. The glamorous fantasy of blogging driving democratic journalism and incisive public commentary is true, but it’s only ever been true for only a tiny part of the whole.

It takes a certain perverse commitment to blog regularly if you’re not being paid for it. Of course increasingly, some bloggers do get paid for it – either as journalists, company employees or, for an elite influential few, through significant online ad revenue. But the fact that some people are paid for it doesn’t significantly affect the cost or value of blogging for the mass of amateurs.

So Nic Carr’s wrong to say that blogging “outside the bounds of the traditional media is gone” – the blogosphere is not dead. Yes, as the Economist says “Blogging has entered the mainstream” – blogging is now accepted a part of the spectrum of modern media. But non-mainstream blogging hasn’t died. There are still plenty of blogs about family, pets, hobbies, and there are still individuals reporting and providing independent social commentary.

Ironically, at the same time as blogging technology is becoming accepted by the mainstream media, it’s also becoming accepted for other more industrial purposes. Blogging technology is no longer just for blogging – the formats that support blogs (RSS, Atom, etc) are used as a REST representation for representing  time-series content, including mundane things such as home loan product announcements.

It’s certainly not dead, but both socially and technically, blogging is growing and adapting. It used to be the message, now increasingly it’s the medium.

ICSOC Day 3 Keynote – Infrastructure as a Service

I had to miss the second day of ICSOC, but was back for the morning of the third, and another great keynote, on Web-Scale Computing, from Peter Vosshall – a VP and Distinguished Engineer at Amazon. Amazon needs a highly reliable and scalable infrastructure internally to run its retail business, but has also been selling web services infrastructure to third parties. Peter spoke about EC2 (compute), SQS (messaging), S3 (storage of blobs with metadata), SimpleDB (storage of lightly-structured data with indexed queries), and EBS (storage for EC2 when you need a traditional filesystem or database).

As an example of how companies are using and benefitting from these services, he talked about a company called Animoto.  On their website you can upload a song and some photos, and they automatically build a video montage, matching transitions to beats.  They started with around 5000 customers in total, but after they built a facebook app and got some viral awareness, they shot up to 5000 to 10000 users per hour. They had deployed on EC2 and ramped up to 3500 – 5000 instances.  It looked like a neat story.

The business benefits of using the web services are having a capability for fast incremental infrastructure growth, and turning what would have been fixed capital expenses into variable operating expenses.  (Coincidentally I had also mentioned this latter benefit of web services in a podcasted interview I was in on Monday.)

As well as supplying web services, Amazon’s using them internally too.  Peter briefly reviewed how Amazon started as what looked like a 2-tier+web client-server web-application, but then refactored that incrementally (and painfully over 2002-2003) into a collection of services. They’ve seen reliability benefits – he said they can lose an entire data centre with no impact on the customer experience.  They’ve also had product management benefits – each service maintains its own data and operating responsibility, which lets them each evolve at their own pace. Amazon’s key NFPs are security, incremental scalability, availability (systems fail not by stopping, and failures aren’t independent), performance (not just mean performance but also performance in outlying cases), and cost-effectiveness.

Intriguingly, despite the claimed product management benefits, he said that 70% of development time was spent on “undifferentiated heavy lifting” delivering updated services – dealing with non-functional, administrative service management issues.  So only 30% of their effort is spent improving the customer experience. I think their delivered experience could certainly use some extra work, especially for their non-US customers!

ICSOC Day 1 Keynote – Services for Science

The 6th International Conference on Service Oriented Computing is on in Sydney this week. NICTA is a sponsor, and I managed to score a registration to attend.  Ian Foster opened with an interesting keynote. (Preceded by a 30 minute delay fussing with Mac technology issues!)  He spoke on “Services for Science” – how SOA is being used to support knowledge creation in science. Currently there’s a surprisingly strong growth of online services providing data and analysis, in astronomy and especially in the biomedical field.  He talked about the caGrid network. Ontologies are key there for meta-data of experimental results – Ian commented that the community is very “neat” (not scruffy) in being explicit and standardised in the representation and organisation of their data.

It’s interesting that for representing scientific workflow they’ve dropped BPEL in favour of the workflow notation and supporting infrastructure in Taverna. The workflows are used not only to coordinate data and analyses, but also to communicate methods and in principle to promote reuse. But the caGrid leaders recognise that it’s hard to design for workflow reuse, and hard to achieve reuse in practice.  Ian also discussed experimental use of functional programming techniques to support provenance – to capture computations as a first class entity for scientific audit, review, and mining. He finished with some discussion of scalability and text mining of research publications.

I think there are interesting analogues of some of the issues now being explored in the e-science domain that have already been thrashed out in software engineering. They are quite similar in some ways – in the two fields of practice at an industrial scale, there are teams of knowledge workers working on complex and partly-shared electronic assets. Large scale reuse and variation has been made methodical in Software Product Line Engineering, and provenance issues are very similar to those that are well known in the established discipline of (Software) Configuration Management.

COAG Invests in a National Electronic Conveyancing System

COAG met on Saturday and decided to invest to implement a national approach to conveyancing – the National Electronic Conveyancing System (NECS).  Currently each of Australia’s eight states and territories has its own different system for dealing with the transfer of real estate.  You might not think that’s a big deal – after all, wherever you are, the house you buy is only going to be in one state!  Why does it matter to have a uniform national system?

At an abstract level from the public’s point of view, when you buy a house, there’s just a buyer, a seller, and a central land registry that maintains the “golden truth” about ownership under the standard Torrens system of title.  It’s a little more complicated than that because mortgages for housing loans are also registered with land registries.  So banks and non-bank lenders are normally involved too.  It’s more complicated than that, because there’s a whole raft of other auxiliary entities involved in title exchange, such as title search companies, lawyers, property valuers, and insurers providing related services.  The whole industry (banks, non-bank lenders, and the auxiliary service organisations) operate nationally.  Currently they need to implement and maintain systems to deal with the land registry systems in each of the eight states and territories.  In the past conveyancing has been a manual process, and human processors have been able to deal with the inefficiencies of working with multiple interfaces.

However, access to land registries is starting to move online, to reduce the cost and time of buying real-estate.  When conveyancing becomes automated, there’s a large initial cost borne by everyone in the industry to integrate with the new system(s).  Companies would prefer to pay this initial overhead cost once, not eight times!

NECS is intended to address this problem.  The goal is not to create a single national land registry, but instead to create a single national interface to all of the state and territory land registries. Organisations will be able to integrate with the national interface, and gain access to the land registries in every state.

Our group at NICTA has been working with NECS, looking at issues in the definition and management of business vocabularies, business rules, and business processes.  NICTA’s research philosophy is “use-inspired research” – working on fundamental scientific advances and technology innovations in the context of, and with an understanding of, real-world problems.   The goal is to do research that has more impact, and benefits Australia.  Our work with NECS is an example of all of this.   It’s still early days, but having a deep engagement with conveyancing and e-government has already been important to motivate and direct the research we’re doing.

Computer Science vs Software Engineering

My University education is in Computer Science, but by professional life and renewed research career is in Software Engineering.  A lot of people (and perhaps some University departments!) probably think these are just the same thing, with different names.  But in my transition to Software Engineering I’ve discovered they’re very different, and I think their difference is not all down to the the normal arguments about science vs engineering.

In Computer Science, the “unit of analysis” is the procedure (in the sense of effective procedure, but I also mean to include non-terminating processes).  Entities of interest include algorithms and data-structures, interfaces, ADTs, types, and languages for expressing them.

In contrast, in Software Engineering, the unit of analysis is the whole software system.  Here the entities of interest include architectures, and system models. A whole software system is not just “bigger” in size than a single procedure/process.  It also has many more different kinds of functionality, many more developers, and many different users and other stakeholders.

There are a lot of common themes across Computer Science and Software Engineering.  For example, both are concerned with issues such as specification, construction, distribution, performance analysis, and verification.

The challenges for Software Engineering are not just dealing with the scale of the system, but also dealing with the scale of the development of the system. The challenges are not just technical, they’re also socio-technical.  So although Computer Science and Software Engineering both deal with software and have many common themes, their technologies and methodologies are usually quite different because they’re dealing with different kinds of entities in different contexts.

Computer Science and Software Engineering Software are very different disciplines.

Lessons on Standards from Build Management

I’ve written an article for the latest edition of the CM Journal at CM Crossroads, on Four Lessons about Company Standards and Procedures from Build Management. Writing in a practitioner’s forum is very different to writing academic papers! I’m not sure I’ve completely got the hang of it yet… but it’s fun trying.

CM Best Practices – Two Lists and One of Mine

A colleague forwarded to me a pointer to the current issue of the ITMPI journal, on Best Practices in Configuration Management. They make an interesting contrast with the best CM practices in the November issue of the CM Journal on “What Best Practice Is Best?”, and specifically the article CM: The Next Generation of Top 10 Best Practices.

The CM Journal article’s list is as follows:

  • 1. Use of Change Packages
  • 2. Stream-based Branching Strategy – do not overload branching
  • 3. Status flow for all records with Clear In Box Assignments
  • 4. Data record Owner and Assignee
  • 5. Continuous integration with automated nightly builds from the CM repository
  • 6. Dumb numbering
  • 7. Main branch per release vs Main Trunk
  • 8. Enforce change traceability to Features/Problem Reports
  • 9. Automate administration to remove human error
  • 10.Tailor your user interface closely to your process
  • 11. Org chart integrated with CM tool
  • 12. Change control of requirements
  • 13. Continuous Automation
  • 14. Warm-standby disaster recovery
  • 15. Use Live data CRB/CIB meetings
  • 16. A Problem is not a problem until it’s in the CM repository
  • 17. Use tags and separate variant code into separate files
  • 18a. Separate Problems/Issues/Defects from Activities/Features/Tasks
  • 18b. Separate customer requests from Engineering problems/features
  • 19. Change promotion vs Promotion Branches
  • 20. Separate products for shared code

The ITMPI Journal articles lists just 3 best practices:

The ITMPI Journal article doesn’t really cover the practices 6, 9, 10, 14, 17, and 20, partly because the CM journal article practices are at a lower level of detail and cover a few practical/administrative issues outside of the core of CM theory.

Anyway, the practices in both these lists are all “good” for “most” development groups. They aren’t rocket science, but I’d say there are a fair number of development groups out there that barely do any of them. Although version control is ubiquitous in industry, CM is still very poorly understood by most practitioners.

For what it’s worth, I think that for most commercial software development, the most important practice isn’t about sophisticated version control issues or processes – it’s a simple but critical piece of organisational structure and policy.

All Customer Deliveries Go Through the Release Team
Create a release team that’s different to the development team. Ensure all customer software deliveries happen through the release team. Have the most senior manager you can find declare it a firing offense for a developer to ship code or patches directly from their desktop to the customer. Even in “emergency” situations.

More than once I’ve seen or heard of companies that own high-end commercial SCM tools, but make those tools worthless because developers sometimes email patches directly to the customer. (Sometimes in the rush they also forget to check in their changes to version control, which just makes the nightmare worse.) The developers are usually trying to “get the job done”, and be “helpful” to a customer with a problem. Sadly, they end up causing bigger problems for everyone down the line. If you’re not managing releases properly, you don’t know what code the customer has – you degrade your ability to diagnose faults, you won’t be able to ship working patches in future, and you might easily introduce a regression problem by losing the fix itself.