/searching A-Z index Help
University of Cambridge Home Computational Biology Group
University of Cambridge > Department of Oncology > Computational Biology Group > Research

Data Integration, Data Mining and Downstream Analysis

The exponential growth of high-throughput genomic and other types of biological datasets have resulted in unique challenges in integrating and analysing results from 'omic' experiments. The large amounts of data generated by current technologies far outstrip the ability to analyse them carefully. Current analysis methods often generate lists of genes that could be biologically meaningful, but these methods have been usually unsuccessful in extracting "biology" from these lists.


Outline


We are interested in designing computational methods, building software systems and performing downstream integrative analyses of large cancer-associated datasets, as well as developing schemas and relational databases for storing and searching such datasets. These will then allow for novel supervised and unsupervised interrogation and visualization of complex biological relationships underlying these cancer datasets.

The integration of different types of 'omic' data is a common problem, requiring both biological and computational modelling. Frequently we need to combine two or more of expression, genomic, epigenetic, proteomic or regulatory data sets and associated annotation in a sensible way. Previously we have developed database driven analysis tools INTERFEROME (Samarajiwa et al., 2008) and systems biology knowledge discovery environments (TOLLOME) and played a large part in the first integrated analysis of miRNA expression, mRNA expression and genomic changes in human breast cancer (Blenkiron et al., 2007). We utilize open source computing technologies, Application Programming Interfaces (APIs), widgets and web services (SOAP, WSDL, REST) for data collection, relational database technologies (MySQL) for efficient data storage, programming languages for data processing and statistical computing (Perl, Java, R),together with web technologies (PHP, AJAX, FLEX, P5) for developing graphical user interfaces and data visualization methods to assist in integrative downstream analysis of large datasets.






Selected publications of ours in this area
  • Samarajiwa SA, Forster S, Auchettl K, and Herzog PJ. INTERFEROME: the database of interferon regulated genes. Nucleic Acids Res 2009 Jan;37(Database issue):D852-7. Epub 2008 Nov 7 [pubmed]
  • Blenkiron C, Goldstein LD, Thorne NP, Spiteri I, Chin SF, Dunning MJ, Barbosa-Morais NL, Teschendorff AE, Green AR, Ellis IO, Tavaré S, Caldas C, Miska EA. MicroRNA expression profiling of human breast cancer identifies new markers of tumour subtype. Genome Biol. 2007 Oct 8;8(10):R214 [pubmed]