BactPepDB is a resource to assist the identification of new bacterial peptides. It corresponds to the re-annotation of all complete prokaryotic genomes available in RefSeq for peptides of size comprised between 10 and 80 residues. These predicted peptides are then classified as:
- previously identified in RefSeq,
- potential pseudogenes (see definition below),
As a result, BactPepDB provides insights about candidate peptides in complete prokaryote genomes, and provides information about their conservation, together with some of their expected biological/structural features.
As of 12/07/2016, the database contains 1,747,413 Peptides from 557 Genuses, 1,226 Species and 2,240 Strains.
Click on boxes to hide/expand categories. Move the cursor over the plot to get additional information.
Gene identification protocol
- All publicly available complete genomes of bacteria are downloaded from RefSeq. Genes are then predicted with the software BactGeneShow, designed for small gene detection, and then translated into amino acid sequences. Only genes of size between 10 and 80 amino acids are considered for further analysis.
- Candidate genes corresponding to genes already annotated in RefSeq are first identified.
- Potential pseudogenes, i.e sequences corresponding to a portion of previously annotated larger genes, are identified using BLASTp.
- A list of new short gene candidates is produced, and candidates are annotated as intergenic or in coding regions.
The following methods are run systematically on each peptide of the database :
- Secondary structure prediction, using Psipred.
- Local conformation prediction as a Structural Alphabet predicted profile.
- BLAST against the PDB, in order to find structures homologous to peptide sequences.
- Prediction of disulfide bonds, using DIpro.
- Prediction of transmembrane segments, using TMHMM.
- Prediction of signal peptides, using SignalP.
- Intra-genus BLASTp, to identify peptides conserved within a particular Genus.
Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.
Nucleic Acids Res. 2002 March 15; 30(6): 1418-1426.
BLAST+: architecture and applications.
BMC Bioinformatics 2008 10:421.
Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching.
Proteins, vol. 62, no. 3, pp. 617-629, (2006)
Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292:195-202.(1999)
SignalP 4.0: discriminating signal peptides from transmembrane regions
Nature Methods, 8:785-786, 2011
Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.
Journal of Molecular Biology, 305(3):567-580, January 2001.
A hidden Markov model for predicting transmembrane helices in protein sequences.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press.
For any suggestions please contact .