Big data and privacy. Or why do I see MTB adverts when reading about US biobanks?

I just replaced the bottom bracket on my old mountain bike. Some tender, love and care and the trusted bike would happily run for another ten years if it wasn’t for the almost completely worn-out rear derailleur. As Google well knows I’ve been shopping around for a replacement. While the placement did not generate a click-through today (work hard, play hard) it did get me started on a post about managing sensitive data.


The ad showed up while reading on the new US initiative to create a major national biobank – that would contain over a million individuals and link clinical data, medical records and genomic profiles. This week, also in Science, there is a whole section on “the end of privacy” – the challenge of remaining anonymous in a world where your trails are increasingly digital. And those digital trails are stored and mined, often for commercial purpose, potentially for great scientific insights.


While there are no favourite children genomic data is central to ELIXIR- and that human genomic data cannot be anonymous was made evidently clear by Yaniv Ehrlich in a paper two years ago. Our approach is managed access – making sure that data is stored in secured archives with access only granted following review by the Data Access Committee to ensure that re-use is in line with patient consent. This is complex – there are technical and data standard challenges but maybe even larger challenges in providing an infrastructure that supports the legal and ethical review. Yet this is important – medical research increasingly depend on genetic data and for cancers and rare diseases researchers need to have access to large populations to hunt down the rare variants. As an organisation formed by European member states that bring together 17 national bioinformatics infrastructure ELIXIR can, and will, play an important role in setting up, monitoring and maintain both technical services and underlying agreements. In partnership with other research infrastructures such as European biobanks in BBMRI-ERIC  we have a mandate and responsibility to provide a comprehensive solution that includes ethical and legal review, data security and, of course, data services for the scientific community.


One of the major bottlenecks in the process is data submission – making sure that data is complete, well annotated and, for human data, the Data Access review is robust. This currently takes a lot of effort and skills; supporting this will need to be an ELIXIR priority. The  Australian data service recently published guidelines on management of sensitive data and ELIXIR have run several pilot actions in this area. It is a rapidly developing field and personally I believe pilots projects – to develop  technical solutions but also effective approaches to ELSI issues  and skills shortage will be our main strategy for some time. Through a portfolio of short “sprints” we’ll build the solutions and experience needed for a stable and standardised approach. International collaboration – e.g. through the Global Alliance for Genomics and Health is key to success in this area as is a willingness to accept heterogeneity and drive collaboration through agreed principles rather than specified technical solutions.