Partially Attended

an irregularly updated blog by Ian Mulvany

Product Tank 6, big data

Wed Jan 23, 2013

1370 Words

I’m at PridcutTank 6. Todays topic is big data.

Hether Savory - Open Data User Group

She is the chair of the open data user group. She is using a recipe analogy for the use of data in the creation of products - data is an ingredient, but you have to get the ingredients right. (There are a number of people in the audience who work with public sector data, and more people who work with open data, but most people know about this, but they don’t use it).

Public sector data is often:

  • maps & location data
  • spending information
  • demographic and health data
  • traffic and transport data
  • business information

The key issues is that this data is a by-product of what government does, and it is paid for by our taxes. There is an opportunity to build on top of this data.

The data should be open and free because we already own the data. It’s already been paid for.

They are looking to open up address data. They need the help of the community to put pressure on the Government to make this happen. They need a solid business case, and ideas from the community on how one would exploit the data to help create applications.

(This is so so like the arguments around open access of the scientific literature).

A question to the audience:

Has charging for this data been a problem in developing applications?

Data.gov.uk
  • Nearly 9k data sets up there.
  • 140+ data sets on the roadmap
  • 810k visitors since July 2012
  • 220 sites globally
  • worth at least 6 Billion to the economy

Where’s the beef?

Lots of data being used so far, but they need to know what’s missing.

There are looking to get the Met office to make historic data open. They are trying to make the VAT register open. They want to make Land Registry historic price data open (as a householder I would love to get access to that data) . River network centrelines and rights of way data is also in their sights. The most critical is the address data.

There is a comparison between the ODI and Jamie Oliver - Blimey!

The bottom line is that this product development community is in an ideal position to give input into what data can be used for making products.

Duncan Ross - Director of Data Science, TeraData

Duncan is a data miner, and he is one of the founders of the society of data miners. He is very interested in prediction. Talking about the importance of prediction he points to the hilarious video of Karl Rove totally fucking up the prediction of the election when Fox news called the election for Obama.

Sam Walton, the founder of WalMart, understood the power of data. They replaced inventory with information.

You can make predictions that are wrong most of the time, but that are still incredibly valuable.

He gives the example of Roulette. If you could make a prediction that was right only once in 36 times, then it would be hugely valuable. If you make a predictor about your customer base that is wrong most of the time, but that beats randomness, then if there is a return on the gambit, then the predictor can be hugely valuable.

You can also use a predictor to refute.

You can also test and measure your hypothesis.

The topic of big data is big, and the signal and the noise by Nate Silver is recommended.

The size of the data is one thing, however there are a number of other trends that enhance.

CrowdSourcing on top of data is powerful. Google, weCAPTCHA, Kaggle and Waze are examples of companies making great use of CrowdSourcing.

Location is another thing of great interest.

Gamification is mentioned. (for the record I am not a fan). There is an interesting piece of information about SAS. They nearly went bankrupt, and the frequent flyers of the airlines realised that if they cashed in their air-miles the airlines would have been pushed into bankruptcy, making all of the air-miles worthless, so there was a group plea for people to not cash in air-miles. The point being made is that had these companies found a gaming mechanisim to endear loyalty it wold have been cheaper for them.

Quantified Self is another movement that Duncan is interested in. If you can start to aggregate this data then the data becomes more valuable. (There are of course privacy issues, and I predict that there will be a question about this in the QA session. If so I will be sure to ignore it - Actually, by the end of the evening it was hardly mentioned, I guess that speaks to the keen business sense of the startup community).

Consumer data lockers is another interesting idea. As an example take data generated by a customer with Orange, it’s kind of assumed that the data generated is owned by Orange. A data locker is the idea that this data is held in a locker that the user has control over. BillGuard does this for credit card transactions, they monitor your credit card transactions for you looking for fraudulent transactions or wasteful transactions (only works in the US at the moment).

The last thing that Duncan wants to talk about is DataKind. It is based in the US, but it’s coming to the UK soon. The question is:

how come the brightest minds of my generation are trying to improve response rates to advertising, rather than solving real problems.

Charities can’t afford to hire good data people. DataKind brings charities and data miners together to work on short hack weekends looking at problems that the data miners can help the charities with.

There is a slide of the reverent Thomas Bayes (who is buried across the road in Bunhill Fields).

Nigel Shadbolt - ODI

Although I’m covering the event, the issue of OpenData is probably well known to readers of this blog (all 12 of you), so I’ll only type up the highlights of the talk.

Key issue about open data tends to be the licensing of the data (hear hear).

They need a steady stream of successes.

A new story of success is the publication of MRSA infection rates in the UK. League tables were published, and in the two years since the publication there has been a greater than 90% reduction in infections. There is no way to prove that publishing the data helped, but you know, it probably helped.

Another great example is prescription data. Every month in the NHS every prescription is published. Data analysis that was done on the prescription of statins indicated that 200M pounds was wasted. This is moeny that can be save every year in the future, by modifying future presecription behaviour.

Contracts, and council spending id being made available. Reported crime, and crime hotspots are being published.

The real trick is to get the data that people care about, and then build an actionable service around this data. (what is the actionable data for research? I think it has to be related to funding, and making the funding information available).

Weather data has been made available (this will probably only depress my German Wife).

There are some examples of business data. A great example is that now that the spending data for local authorities are being published, there is a company that is selling analytics back to these authorities. Many of them didn’t realise, for example, that they were paying over the odds for services that other authorities were getting better deals on.

There is a very nice overview of the ODI. They have had a great first 10 weeks of operation.

How do you build a business on a data feed which might not be timely, and upon whose quality you cannot depend?

The ODI is a convening point for these conversations, and a place to help bridge connections. Nigel believes that unless they can create a strong demand side for data then Government’s patience for this program may wane. One needs to get to a virtuous cycle.

Finishing up

Great event, you can sign up for the next one, and you can get an overview of ProductTank.

This work is licensed under a Creative Commons Attribution 4.0 International License