S2F

What is S2F?

S2F is a command-line tool for performing ab initio protein function prediction in bacteria.

S2F stands for “Sequence To Function”. Using S2F on a set of protein sequences you can get their corresponding functional annotation. S2F uses, as a base, InterPro and homology-based methods (HMMer). Function annotation comes in the form of Gene Ontology terms.

The way S2F obtains the functional labels is by combining an initial annotation (InterPro and HMMer) with function similarity graphs obtained by transference from STRING. The graphs are combined via regression and the initial annotation is then diffused through the graph. Finally, for every pair (protein, GO terms) we obtain the probability of this protein being annotated by that term.

We showcase the performance of S2F by making function prediction on 10 Bacteria. You can explore these results below. These are interactive charts and you can select the metric and the organism to inspect the result.

The S2F paper:
Mateo Torres, Haixuan Yang, Alfonso E. Romero, Alberto Paccanaro
Protein function prediction for newly sequenced organisms
Nature Machine Intelligence, 2021

This paper has been highlighted in Nature Machine Intelligence. Read the comment: Combining views for newly sequenced organisms by Yingying Zhang, Shayne D. Wierbowski & Haiyuan Yu

S2F Software

Quick Start
GitHub Repository
Documentation Wiki

Replicating our results from the paper

Instructions
Input data files
Fork of DeepGOPlus
GOLabeler/NetGO implementation

Bug reports, feedback

Something’s not working for you? Do you think you found an error? Do you want to contribute to the development of S2F? Contact us!