What is ConSAT?

ConSAT is a command-line tool for automatic annotation of gene families. ConSAT is built on top of GFam, providing extra functionality and higher efficiency.

ConSAT stands for “Consensus Signature Architecture Tool”. Using ConSAT on a set of protein sequences you can get their corresponding protein families. The protein families, for our purpose, are consensus domain architectures. Proteins within the same family are assumed to share many properties among them as they are assumed to descend from a common ancestor –that is, they are evolutionarily related. Thus, protein families can be of a good help in the study of large protein sets, moving us from the study of single sequences to the study of the set of families (the number of families found will be lower than the set of individual sequences).

The way ConSAT builds the architectures is by combining two sources of data: (1) domain assignments from InterPro, and (2) domain assignments from GFams (our own library of putative domains). The combination is done in a way that no overlapping domains will be allowed, and maximising the sequence coverage. ConSAT assigns functional labels to the architectures in two flavours: Gene Ontology terms and free text English words.
