Director for the BDAU interviews the founder of the Open Data Science Conference in London

By Joshua Symons, Policy Fellow, Big Data & Analytical Unit, Centre for Health Policy  

open-data-confOn the 8th and 9th October, I had the opportunity to attend the Open Data Science Conference in London. In addition to the United Kingdom, the ODSC also occurs on both the East and West Coast of the US, as well as Tokyo. The 2-day conference had an array of speakers presenting problems and solutions they have worked on as data scientists. It was an opportunity to meet some of the leaders in the field of data science such as Gael Varoquaux. Gael is a core contributor to the popular Python machine learning resource scikit-learn and he spoke about the new and existing features of this package which help ensure rapid development in data science.

Aside from the technical speakers there were also interesting talks about how to turn ideas into reality. One such talk was given by Michael Margolis, the CEO of Get Storied, a company which specializes in helping company leaders and corporate teams in presenting their stories to deliver maximum impact. His keynote quote by Ben Horowitz stuck with me, “A company without a story is usually a company without a strategy”. Michael directly works with Google, Facebook and other industry leaders in presenting their visions as stories that involve the audience and draw them into those visions.

One of the other talks that really captured my attention was by Ian Osvald who co-authored High Performance Python. Ian spoke about how his wife and he set out on the quest to identify the source of her frequent sneezing. Together they wrote an IOS app to help collect and analyse the data surrounding her sneezing (up to 26 times a day). They used this data to inform themselves about whether or not the medications that she was prescribed were actually helping to alleviate the symptoms. This journey for answers, from identifying the problem to finding a solution, seemed to me to embody the ethos of a data scientist.

I also had the opportunity to sit with Sheamus McGovern, who is the chair and founder of the ODSC, and ask him a few questions. Sheamus and I share a similar background in finance and so I was very keen to find out what brought him to start this conference. Below is a transcript of our discussion. 

First off, I have to say, this is a great conference, some amazing speakers and fascinating topics. I’ve been particularly struck by the amount of discussion around code and data re-use. Just to get an idea, what gets you really excited about Open Data and its application in science?

The whole inspiration behind the Open Data Science Conference really came out of meetup. I had been going to conferences, mostly professional and some academic, and when I attended my first meetup I was blown away on a number of fronts. First of all, the quality of the talks and even more so than that was the quality of the audience. I thought wow, this is a phenomenon. Coming from a closed environment like finance, to an environment where people are willing to put a lot of their time into openly sharing ideas; the selflessness of that really inspired me. I wanted to build a user conference around open data science and to me that really means the open exchange of ideas, among all backgrounds and professions.

From what I’ve experienced so far, the focus here seems to be around technical solutions for data processing, which is great, but I think you’ll agree there’s also a massive challenge in rapidly obtaining quality open data. What do you see as the greatest challenge with Open Data Science?

I’d say the greatest challenge is regulatory. The legislation is all over the map in both the US and the UK. So there are challenges from both sides – how do we get access to more data and what forms does it take. In regards to personal data it’s really tricky, there’s a conflict between wanting to share useful data and still retaining privacy. It’s difficult to share data from many aspects, data cleanliness, ownership, provenance, licensing, so like anything in data science, data wrangling is a huge challenge. Collaborative efforts can be a challenge here as well, say by sharing models and data where you might have one but not the other and both have associated IP [intellectual property].

You’re probably well aware that there is a movement within the UK Government to move towards more open data and open standards. Do you see a role for the ODSC in calling on governments and industry leaders in producing more open data?

Absolutely. Undoubtedly we need more people who understand the language of data science. Open datasets are the catalyst for innovation because they also allow for dissemination of domain specific knowledge, it’s more than just data. Democratizing data science is one of the primary goals of the ODSC.

Thank you very much for your time and thank you for the ODSC. What do we have to look forward to for the future of the ODSC?

We are continuing to expand the conferences and also our meetups, hackathons and code sprints. We aim to continue to make data science more accessible to more people. The ODSC is always looking to help facilitate local communities and help enable them to expand.