Proceed with Caution: the Dark Side of Data

So recently my life is full of tech talks. First I went to this Girl Geek Dinners Social and Economic Development Through Technology series of talks. I was a little nervous about going, actually, because it’s geared towards women in technology and I thought I might not be enough of a programmer to fit in. But my work is pretty nerdy (databases ahoy!) so I relaxed pretty quickly. In any case, it’s always worth going along to an event that’s a little outside your comfort zone, because it gives you a chance to see how other people work and you can usually take something useful from what they have to say. I also like getting to see other people’s workspaces–it reminds you not to get too bogged down in one idea of how work can happen.

I was really excited about the theme of the talks–technology in society is right up my alley. I wanted to hear how technologists were engaging with technology users, how they were finding out what people needed and applying that knowledge. I wanted to hear a big discussion about the relationship between technology and society as a whole, about the unexpected behavior of users, about points of doubt, about uncertainty and bad uses of technology–not just all the cool stuff people and their companies were doing. Basically I wanted to hear a discussion in a philosophical way about what technology is for.

There was less of that than what I hoped, but an interesting (and I’m sure for many people in the audience) unexpected discussion emerged around ethics. Unfortunately the person bringing the point up phrased it in a way that didn’t really emphasize her point to best effect, but I thought it was a really interesting and often sublimated point when talking about large data sets. Essentially, she raised a point about the potential for massive, freely available data sets to be used in unintended ways, or even in really negative ways.

This thread was picked up, over and over again, in presentations at the Research Methods Festival I attended last week. It is so much easier now than at any time in the past to gain access to massive amounts of data, much of which is automatically captured online. This raises a lot of new challenges, many to do with ethics (how do we obtain informed consent? Is anyone actually reading those terms and conditions–do they realize that we’ll be able to build a geolocating Google street view of twitter data, for example?) but not all.

Another emerging problem for researchers is the complexity and the quality of data gathered in new ways. How do you manage the Niagra Falls of data that is now gushing forth from the internet? How do you make sure it’s consistent? How do you measure its accuracy?

A method shared between commercial and academic research is good old-fashioned interviews. These can be face-to-face, by telephone, individual or group. Researchers are all (or at least should be) aware of the problems inherent with this method: people are not necessarily accurate when they talk about what they do–sometimes they lie deliberately, sometimes they just can’t remember. But either way, it’s still misleading. There are also problems with interviewer bias: without necessarily intending to, the researcher can influence the outcome of an interview by guiding the interviewee’s choices. Again, not actually useful.

One of the advantages of mining all this automatic online data is that some of the problems inherent in interviewing are mitigated. If you’re observing behavior directly (such as what songs are being downloaded by whom, or how many posts people make in a day, or where they are when they use a particular website), people can’t lie about this. There’s also no problem with interviewer bias: you are directly observing behavior without interjecting yourself.

But people can lie in other ways, some of which are clever and subtle. This ranges from something as simple as filling in erroneous data on your profile to trying to defeat price-targeting algorithms. An article in the Economist last week discussed methods online retailers are using to target pricing to individual consumers. Consumers could potentially counteract some of the effects of targeted pricing by doing things like using different browsers, or looking at low-price items even though they may already be sure of what they want to buy. (I noticed about a year ago that if I looked at tickets on a particular airline’s website for the same flight more than once, the price went up…but if I cleared my cache, the original price was miraculously restored.)

Basically, it’s worthwhile being cautious about some of this data because the methodological challenges have shifted. And they will continue to do so as new ways of gathering data and new methods for processing it emerge. But, returning to the Girl Geek Dinners for a moment, what I did take away from that session is that there will always be people out there looking for a way to put emerging technologies and emerging research techniques to a use that will have a positive social impact.

(Originally posted on skirt.com)

In a Merry Hour: Caitlin E McDonald

Frequently a bit silly.

Share this:

Related