/searching A-Z index Help
University of Cambridge Home Computational Biology Group
University of Cambridge > Department of Oncology > Computational Biology Group > Resources

BayesPeak

This is an algorithm for finding enriched locations in ChIP-seq data. A package is being developed to provide a genome-wide analysis and will be released here. Last update: 7/10/09.

C and Perl codes

Instructions

The sequencing procedure following chromatin immunoprecipitation produces short sequences representing the ends of the fragments contained in the sample. The bed files produced after these reads are aligned back to the reference genome need to be converted into counts-per-window for BayesPeak to analyse them. The perl script above, does exactly that, for each sample producing two count files, one for the forward and one for the reverse DNA strand.

To use it, copy the scripts and and data files to your working directory and type:

  perl   forBayesPeak.pl   input_filename.bed   window_length   output_filename_forward.txt   output_filename_reverse.txt

For example, to use the input and output files that follow, using 300bp genomic windows type
  perl   forBayesPeak.pl   H3K4me3.bed   300   H3K4me3_for.txt   H3K4me3_rev.txt
  perl   forBayesPeak.pl   Input.bed   300   Input_for.txt   Input_rev.txt
(of course, the output files are already available in this case).

Then, to do the peak-calling, compile the C code by typing

  gcc   BayesPeak_H3K4me3.c   -lm   -O3   -o   out

and then run by typing: out or ./out

This will produce the files

parameter_estimates.txt   which contains the parameters of the model at each simulation (details will follow)
posterior_probs.txt   which contains the posterior probabilities of all windows in the region under study, and
peaks.bed   which contains all the windows with non-zero probability of being enriched.

We recommend using a threshold of 0.50 for those probabilities and then joining the resulting adjacent windows to define the peaks in the data.

At the present state, the code is only available to run on this dataset and looks at the specific region 92-95Mb on mouse chromosome 16 (mm9). This is a preliminary presentation of our algorithm and modifications will follow.

Example data and files

This site is being updated, for any enquiries with the above scripts and data contact C.Spyrou[at]statslab.cam.ac.uk