BayesPeak is a BioConductor package for the analysis of data sets from ChIP-seq experiments, particularly for identifying the genomic sites of protein-DNA interactions.
The algorithm models the positions and orientations of the sequenced fragments and determines the locations of enriched areas, corresponding to binding sites and histone modifications, by using a hidden Markov model and Bayesian statistical methodology.
The Bayesian approach to parameter and state estimation returns posterior probabilities as measure of certainty, and offers great scope for interpretation, as well as allowing for the use of these probabilities as weights in subsequent analyses (e.g. motif discovery).
The other important feature of the algorithm is the use of the negative binomial distribution to model the counts of sequenced reads. This allows for overdispersion and provides a better fit to the data than the Poisson distribution that has been widely used by other methods.
Last update: 28/04/10.
BayesPeak is available as an R package from the 2.6 release of BioConductor onwards. Instructions for using the package can be found in the package vignette, on the same page.
Example data and files
More detailed information on the BayesPeak algorithm can be found in the following papers:
- BayesPeak: Bayesian Analysis of ChIP-seq Data (BMC Bioinformatics)
- Christiana Spyrou's Ph.D. thesis: Development and application of Bayesian methodology for some missing data problems in biology (PDF - 20.1MB)