I went to the All Your Base database conference last week and while there it struck me that people mean really, really different things when they talk about data, even within the same business. My background is in data analysis, so to me until very recently ‘data’ pretty much only related to enterprise data or business intelligence data. (Of course before that, ‘data’ for me only meant qualitative data. But that’s a story for another time.) I now work a lot more closely with a technical team and it’s become obvious that they have a whole different level of things they’re interested in getting out of data. The kind of data that keeps the website running is crucial for the developers and the IT infrastructure team, but it’s rarely of interest to the marketing, finance, sales and content development teams.
It surprised me that at this conference only two people touched on the latter type of data (Laura Thomson of Mozilla Corp and Alistair Hann of Skycanner). Thomson posited that BI data can be used as a really high-level monitoring system: if you suddenly notice that your traffic-to-purchase conversion rates are going haywire, it might mean there’s a technical problem somewhere. She also made the very good point that it’s much better to be aware of what numbers people are expecting to see, keep on top of them when they’re going wrong, and start your investigation before the other business departments start calling you up and panicking at you, because that is never fun. Hann introduced how Skyscanner makes use of near-real-time stream processing for reporting on enterprise data. Many companies use batch processing systems to deal with massive amounts of data but depending on the size of the query, these can take a variable amount of time to return answers. That’s fine if the reporting is not time-sensitive, but for things like real-time monitoring alerts it’s not so useful.
A common theme among many of the talks was the constant, though not necessarily synchronous, evolution between software and hardware solutions for managing ever-larger amounts of data. Incremental advances on each side mean that both disk storage and cloud storage come with their own challenges. So do the traditional and new methods for querying data (relational databases on the one hand, and all the newfangled stuff like graph databases and search engines and…magic data unicorns and whatnot.) Contrary to what many people currently believe, cloud storage and the new methods for accessing data stored there aren’t necessarily magic elixirs to solve the problems of disk storage. From the number of times this came up it seems that there is a pressing need for something that will combine the advantages of disk and cloud storage (and of relational databases and newer methods), or at least allow for less drastic trade-offs.
Data is a funny thing, because at the end of the day what really matters is not the data itself. It’s the interpretation of meaning. Data should ultimately be a call to action. Those actions could range from repair and maintenance to development of business strategy. Clearly different kinds of data need to be collected to make the right interpretations and form the right plans, and a number of methods can contribute to that knowledge. The conference certainly covered some interesting developments in the ways data can be stored and interrogated, as well as some problems in those areas that still need to be addressed.