Friday, April 8, 2011

Is Data like Timber?

For quite some time, I'm playing with this idea of comparing data with some raw material (like timber) and information with some product made from that raw material (like furniture). This helps to suggest (a) that data and information are related but not the same thing and (b) that there is processing inbetween similar to converting wood to furniture. With this blog, I like to throw this comparison out into the public, also to hopefully trigger a good discussion that either identifies weaknesses of this comparison or develops the idea even further.

So let's picture the process on how a tree becomes a cupboard, a book shelf or a table:

  1. Trees grow in the forrest.
  2. A tree is cut and the log is transported to a factory for further processing.
  3. At the factory, it is stored in some place.
  4. It is the subsequently processed into boards. Various tools like saws, presses etc. are used in this context.
  5. The boards are frequently taken to yet another factory that applies various processing steps to create the furniture. Depending on the type of furniture (table, chair, cupboard, ...), a high or a small number of steps, complex or less complex ones are necessary. Additional material like glass, screws, nails, handles, metal joins, paint, ... are added. Processing steps are like cutting, pressing, painting, drilling, ...

Now when you consider what happens to data before it becomes useful information displayed in a pivot table or a chart then you can identify similar steps:

  1. Data gets created by some business process, e.g. a customer orders some product.
  2. For analysis, the data is brought to some central place for further processing in a calculation engine. This place can be part of a data warehouse or of an on-the-fly infrastructure, e.g. via a federated approach that retrieves the data only when it is needed.
  3. At this central place, it is stored in a central place, e.g. persistent DB tables or a cache, where it can potentially "meet" data from other sources.
  4. Data is reformated, harmonised, cleansed, ... using data quality, data transformation tools or plain SQL. Simply consider the various formats for a date like 4/5/2011, 5 Apr 2011, 5.4.11, 20110405, ...
  5. Data is enriched and combined with data from other sources, e.g. the click stream of your web server combined with the user master data table. Only in this combination you can, for instance, tell how many young, middle aged or old people look at your web site. In the end, data has become useful information.
Hopefully, the similarity between the two processing environments has become apparent. In the end, steps 1. - 5. describe a layered scalable architecture (LSA). In one or the other situation, steps will be much simpler or can even be omitted, similar to what type of furniture you want to produce: a book shelf needs less processing than a sophisticated cupboard. I guess that one can now start to play the analogies: operational reporting, i.e. reporting on data from one source, e.g. one single process ("Which orders have been submitted today?"). This is probably tantamount to producing boards or shelves. The log gets cut by a saw, maybe pressed and that's almost it. One can imagine that this is even done real-time, i.e. directly after the tree has been cut. In contrast, producing a sophisticated analysis (i.e. a cupboard) takes more processing steps and, specifically, involves more data from outside (i.e. materials like screws, paint, joins, glass, ...). Similarly, one could spin the idea to find analogies for data warehouses, data mining, dashboards, data marts etc.

No comments:

Post a Comment