BactPepDB

Introduction

BactPepDB is a resource to assist the identification of new bacterial peptides. It corresponds to the re-annotation of all complete prokaryotic genomes available in RefSeq for peptides of size comprised between 10 and 80 residues. These predicted peptides are then classified as:

previously identified in RefSeq,
potential pseudogenes (see definition below),
intergenic,
entity-overlapping.

Additional calculations are performed including the search for similarities within or among genus, the search for signal sequence or transmembrane segments, predicted secondary structure and disulfide bonds, and the search for homologs with a known 3D structure in the Protein Databank.

As a result, BactPepDB provides insights about candidate peptides in complete prokaryote genomes, and provides information about their conservation, together with some of their expected biological/structural features.

A brief overview of the database interface and its functionalities can be found at the bottom of the Help page under the form of a step-by-step tutorial.

As of 03/02/2026, the database contains 1,747,413 Peptides from 557 Genuses, 1,226 Species and 2,240 Strains.

Click on boxes to hide/expand categories. Move the cursor over the plot to get additional information.

Methodology

Gene identification protocol

All publicly available complete genomes of bacteria are downloaded from RefSeq. Genes are then predicted with the software BactGeneShow, designed for small gene detection, and then translated into amino acid sequences. Only genes of size between 10 and 80 amino acids are considered for further analysis.
Candidate genes corresponding to genes already annotated in RefSeq are first identified.
Potential pseudogenes, i.e sequences corresponding to a portion of previously annotated larger genes, are identified using BLASTp.
A list of new short gene candidates is produced, and candidates are annotated as intergenic or in coding regions.

Biological analyses

The following methods are run systematically on each peptide of the database :

Secondary structure prediction, using Psipred.
Local conformation prediction as a Structural Alphabet predicted profile.
BLAST against the PDB, in order to find structures homologous to peptide sequences.
Prediction of disulfide bonds, using DIpro.
Prediction of transmembrane segments, using TMHMM.
Prediction of signal peptides, using SignalP.
Intra-genus BLASTp, to identify peptides conserved within a particular Genus.

References

BactGeneSHOW

P. Nicolas, L. Bize, M. Hoebeke, F. Rodolphe, S. D. Ehrlich, B. Prum, & P. Bessières.
Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.
Nucleic Acids Res. 2002 March 15; 30(6): 1418-1426.

BLAST 2.2.27+

Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., & Madden T.L.
BLAST+: architecture and applications.
BMC Bioinformatics 2008 10:421.

DIpro 2.0

J. Cheng, H. Saigo, & P. Baldi.
Large-Scale Prediction of Disulphide Bridges Using Kernel Methods, Two-Dimensional Recursive Neural Networks, and Weighted Graph Matching.
Proteins, vol. 62, no. 3, pp. 617-629, (2006)

Psipred

D.T. Jones
Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292:195-202.(1999)

SignalP 4.1

Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne, & Henrik Nielsen
SignalP 4.0: discriminating signal peptides from transmembrane regions
Nature Methods, 8:785-786, 2011

TMHMM 2.0

A. Krogh, B. Larsson, G. von Heijne, & E. L. L. Sonnhammer.
Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.
Journal of Molecular Biology, 305(3):567-580, January 2001.

E. L.L. Sonnhammer, G. von Heijne, & A. Krogh.
A hidden Markov model for predicting transmembrane helices in protein sequences.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen, editors, Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park, CA, 1998. AAAI Press.

Contact

For any suggestions please contact .

Home

Search

Help

Links