You have javascript disabled in your browser. We recommend turning it on for a better experience on this site.

Big Data

Data-driven research looks set to become world-class across a range of areas at the University of Leeds, with projects currently underway to create a new MRC-funded Research Data Service and a national ESRC Consumer Data Research Centre.


T he two Research Council Centres will be co-located from summer 2015 in the Leeds Institute for Data Analytics (LIDA) so that all disciplines can engage and benefit from this rapidly developing expertise. Raising the bar in standards of data quality, access, protection, and exploitation, interested parties in academia and commerce can look forward to a strong future for data analytics.

With funding of £7m awarded by the Medical Research Council (MRC), a team from the Faculty of Medicine and Health are currently finalising plans for the University of Leeds’ first Medical Bioinformatics Centre.

Putting in place a purpose-built infrastructure – a hybrid of technology, people and standard operating procedures – the team’s ultimate aim is to provide a ‘virtual research environment’, where anonymised but person-level data is accessible only to authorised researchers and their collaborators.

Security is key

Protecting this data is a challenge in itself. After a careful analysis of the risks, the team have designed tight controls and procedures to ensure that the data is always accessed and used correctly. Professor Jeremy Wyatt, chairman of the Centre’s academic affairs group, explains, “Researchers will only be issued with an authorised login to access the data once they’ve applied for, and received, the relevant ethics permission. They will also have to complete a short e-learning course to obtain their login details.

“The law is changing on data protection and new EU legislation is on its way next year. For this reason we’ve taken a great deal of care designing a flexible technology platform that we can adapt as the laws change.”

“They will then only be given access to the extracts of data that are relevant to their research and data analytics tools on their own encrypted virtual desktop. This prevents any copying of the data to other computers or USB devices and ensures that Centre staff can keep a close eye on how the data is being used, to prevent any risks to confidentiality. While most researchers will access this virtual desktop remotely, we will also have secure rooms in our Centre for the use of researchers who want to analyse highly sensitive data.

“The law is changing on data protection and new EU legislation is on its way next year. For this reason we’ve taken a great deal of care designing a flexible technology platform that we can adapt as the laws change. Essentially we need to build the trust of patient and data providers including the NHS, social care and other sources in our data management processes. This will also include an annual external audit of how tightly we adhered to our standard operating procedures.”

The benefits of the new Centre will be far reaching. As a result of building greater trust in data providers, researchers will have access to more relevant and better quality data than ever before. The virtual research environment will run on a state-of-the-art ‘super computer’, so they will no longer have to wait for results, and since the analysis software is provided and managed, researchers can focus on providing new knowledge and understanding from the data.

“We’re also designing a new Masters course in tandem with our new MRC Centre,” continues Professor Wyatt. “We’ve noticed an increasing trend for the NHS to advertise data analytics posts – as many as 10 of these jobs come up a week with an average salary of £30k, so the NHS is spending around £13.5m per year on salaries for data analysts. Our new Health Data Analytics track in the Health Informatics Masters will help the NHS and industry to recruit people who are experts in this field.”

Working in partnership

The team from Leeds are cooperating closely with a range of external partners to ensure the Data Centre has access to the highest quality data sets.

“Working in tandem with Leeds Teaching Hospitals Trust (LTHT), England’s largest NHS Trust, we are helping to ensure that the NHS have the right infrastructure in place to transfer fully anonymised versions of electronic data from consenting patients into our Centre. IT professionals in the NHS and the University are working together to link us up across Leeds with a ‘dark fibre network’.

“We are working with The Phoenix Partnership (TPP), one of the top three GP software suppliers, so we can link with their anonymised data resource ResearchOne – which recently won a national prize for outstanding contributions to research. We are also collaborating with the HSCIC (Health and Social Care Information Centre) to gain access to other data sets such as the Hospital Episode Statistics. Finally, we’re partnering with the Clinical Practice Research Datalink (CPRD) in London.”


Research that will take place at the new Medical Bioinformatics Centre will enable links between medical records and high volume molecular data. Several of the initial projects will focus on biomarkers. Professor Rosamonde Banks, specialist on proteins, Professor Tim Bishop and bioinformatics expert, Professor David Westhead, are all heavily involved.

“Commercialising biomarkers potentially has a huge impact on the life sciences industries. Our work dovetails with the government’s push to support academic and commercial life sciences in finding new treatments.

The pharmaceutical industry is also very interested in this field as they are increasingly under pressure from regulators to ensure that drugs prescribed for one condition aren’t causing serious side effects. And in the new NHS, with qualified commercial providers now offering health services, there is also fresh interest in health data, with data-driven quality improvement high on their agenda.

“Analysis of this data in a secure, monitored and trusted environment will benefit patients by uncovering the underlying molecular mechanisms of disease, suggesting new tests and monitoring how effective medicines are.”

The Consumer Research Centre (CDRC)

The Consumer Data Research Centre is driving research on changing society and supporting customer facing organisations to realise the value of their data and maximise their innovation potential.

The Centre received funding of £11m from the Economic and Social Research Council and builds capacity with shared leadership between the University of Leeds and University College London, with additional partners at the University of Liverpool and the University of Oxford.

The Centre

Professor of Spatial Analysis and Policy & Director of the CDRC at Leeds, Mark Birkin, has spent a number of years working with retailers and leads the project. The main aim of this project is to open up those resources to academic research and find ways in which value can be extracted to benefit business, government and society at large.

“Data from retailers is interesting to us, as geographers, as we can learn about mobility on a short, medium and long term basis. It’s also of importance to our partners in the University’s Business School as they can look into consumer behaviours. Environmentalists are interested, too – in green consumption patterns and sustainability, for example. We think this Data Centre will answer a whole host of social science questions.

The OmniGlobe helps display visualisations of big data at the Leeds Institute for Data Analytics

How will data be used?

“Analysing good quality data will help to inform policy, development, implementation and evaluation. In terms of location planning, for example, this is helpful not just for retailers but also for schools, police stations and hospitals. We have a complex and aging population, and understanding how they move about helps us plan how we can support them.”

The Centre will have a state-of-the-art data infrastructure with three secure access sites throughout the UK, one of which will be located here at the University. “Data will be available to anyone who is interested. Not just geographers, environmentalists and business students, but other academics, too – criminologists, for instance. Rather than looking at data on an individual level, we’ll deal with broader social, demographic and behaviour patterns.

“We see our work taking place on a number of levels. We will acquire data from retailers and work with them to identify how they can analyse consumer patterns to improve their products and services, whilst at the same time providing a service to researchers and undertaking research ourselves.

The Centre will also offer training and education, ranging from introductory courses for postgraduate students through to advanced training for data scientists. “Undergraduates will benefit too, we’re already closely involved in teaching undergraduates using new data sources, this will ensure that our graduates have the necessary skills to prosper in the global big data economy.

Working in partnership

Just a few months into the project, and a wide range of retailers have shown an interest, we are currently meeting with a number of these to discuss how we can work together in the future.

“We need to establish broader agreements for data sharing. Retailers are understandably concerned about handing over data due to competition worries, however like the medical bioinformatics centre, the CDRC has rigorous controls and procedures in place to protect the data. We’ve already got one retailer on board, and hope to have at least half a dozen signed up by the end of our first year.”

Whilst the two data centres operate independently, the strong synergies between them could prove invaluable in the future. Professor Birkin explains, “If we were able to combine data from the Medical Bioinformatics Centre with insights from consumer research, we could help with a range of challenges that are of national interest. For example, we could trace relationships between shopping habits and health outcomes, such as obesity or diabetes.” Yet, such research requires a number of ethical considerations, so there is no commitment either way to share data at this stage. So far, each centre has more than enough research projects already lining up to use its own independent datasets.

This article was updated on 20 January 2016 to reflect the full scale of the ESRC’s investment.

Back to the top