What is S2F?

S2F  is a command-line tool for performing ab initio protein function prediction in bacteria.

S2F stands for “Sequence To Function”. Using S2F on a set of protein sequences you can get their corresponding functional annotation. S2F uses, as a base, InterPro and homology-based methods (HMMer). Function annotation comes in the form of Gene Ontology terms.

The way S2F obtains the functional labels is by combining an initial annotation (InterPro and HMMer) with function similarity graphs obtained by transference from STRING. The graphs are combined via regression and the initial annotation is then diffused through the graph. Finally, for every pair (protein, GO terms) we obtain the probability of this protein being annotated by that term.

We have used a preliminary version of S2F to participate in the CAFA challenge (2010), and the current version in the CAFA 2 challenge (2014) and the CAFA-PI challenge (2019).

We showcase the performance of S2F by making function prediction on 10 Bacteria. You can explore these results below. These are interactive charts and you can select the metric and the organism to inspect the result.


Results by organism


Publications featuring S2F

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
bioRxiv 653105,doi: https://doi.org/10.1101/653105, 2019.

A large-scale evaluation of computational protein function prediction
Nature Methods,10(3):221-7, 2013.
Source code
Download it

Blacklists (Supp. Table 1)
Download (MS Excel)

Wiki on GitHub

