Turning Big Data to Knowledge

The term ‘Big Data’ captures the opportunities and challenges biomedical researchers face in accessing, managing, analyzing, and integrating datasets of diverse data types (e.g., molecular '–omics' data, images, phenotypes, exposures, health effects and outcomes, behavioral data, and the many others). These datasets are growing exponentially and are becoming increasingly diverse, more complex, and exceeding the abilities of currently used approaches to manage and analyze the data effectively.  



Current Events


March 23, 2015 Society of Toxicology (SOT) Outreach Meeting, San Diego, CA

  • Turning Big Data to Knowledge (BD2K):  A discussion of the NIH BD2K initiative and how it might advance the practice of Toxicology and Risk Assessment:  The fields of toxicology, pharmacology and risk assessment are undergoing a revolution in the use of pathway-based approaches to evaluate the biological effects of chemicals.  These fields would benefit from accessible tools that make big data convenient and intuitive to integrate, analyze, query and visualize.  The aim of this ancillary meeting is to reach out to toxicological scientists and introduce them to the NIH big data programs. A panel of researchers will provide short overviews of the BD2K and LINCS initiative and thoughts on how big data can be leveraged for protection of people and the environment.
  • The meeting will be held in the Del Mar Room at the San Diego Marriott Marquis & Marina (SOT Headquarters, located directly next to Convention Center
  • For more information on the Outreach Session, contact Dr.Michael Dourson (dourson@tera.org , ph. 513-542-7475 x33)












BD2K and LINCS Programs


The BD2K program addresses four major aims that, in combination, are meant to enhance the utility of biomedical Big Data by:

  • Facilitating broad use of big data by making them discoverable, accessible, and citable.
  • Conducting research and developing the methods, software, and tools to analyze biomedical Big Data
  • Enhancing training in the development and use of tools necessary for biomedical Big Data science.
  • Supporting a data ecosystem that accelerates discovery and use.



The Library of Integrated Network-based Cell Signatures (LINCS) program is a major contributor of the Big Data to Knowledge (BD2K) initiative.  The LINCS program is an NIH Common Fund program supporting six Data and Signature Generation Centers as well a common data annotation and cross accessibility.  LINCS datasets consist of assays resulting from large numbers of cultured and primary human cells treated with a large array of bioactive small molecules, ligands and cytokines, or genetic perturbations. Many different assays are used to monitor cell responses, including assays measuring transcript and protein expression cell phenotype and behavior. 


There is significant interest in the toxicological and risk assessment communities about using big data approaches to identify adverse outcome pathways (AOPs) of toxicity and to minimize animal-based testing toxicity methods. Pathway-based approaches offer an opportunity to evaluate biological effects of chemicals in a manner that will be quicker, more economical, reduce reliance on animals and provide far greater mechanistic insight. 

Consequently, toxicology and risk assessment are beginning to undergoing a revolution in the use of pathway data, high throughput assays and chemical structure-activity relationships to assess and predict biological effect of chemical exposures. However, proper analysis of large data sets and integration or data types currently requires a high level of expertise. For this reason, toxicological sciences would benefit from accessible tools that make big data convenient and intuitive to integrate, analyze, query and visualize.  Availability of such tools would help scientists deal more effectively with the thousands of untested chemicals currently in commerce.


The BD2K and LINCS projects are multimillion dollar trans-National Institute of Health (NIH) initiatives to facilitate discovery, support development of new knowledge, and maximize community engagement. TERA is leading an outreach effort by the BD2K consortium to connect with scientists in the toxicology, pharmacology and risk assessment communities.  Two objectives of this outreach effort are to inform potential big data user communities about the big data initiative and also to elicit feedback from user communities about community data standards, types of information needed, and major data concerns.



More information about the NIH BD2K and LINCS programs can be found at: