PINALOG is a protein-protein interaction(PPI) network alignment method aiming to map similar parts of PPI networks in terms of protein similarity (by both sequence and function) and the local network similarity. PINALOG incorporates not only sequence similarity and topological similarity of proteins in the equivalencing process but also uses the function similarity of protein pairs.
The algorithm starts with establishing a list of seed protein pairs by mapping dense subnetworks (communities). The underlying principle of mapping communities lies in the modularity of PPI networks. Functional modules of these networks are expected to be conserved across species, for example, the proteasome complex exists in all eukaryotes and archaea and some bacteria. Thus these functional modules may serve as a valuable starting point for the mapping of proteins across species via PPI network. The further extension from the seeds are obtained by mapping neighbouring proteins of the seeds, awarding scores for protein pairs whose first and second neighbours have already been aligned. Pairwise alignment of pairs of species such as human-yeast, human-mouse, human-fly and human-worm from the IntAct database reveal a large number of conserved interactions, mapped protein pairs that belong to the same Homologene groups and the number of interlogs being detected.
PINALOG comprises of three main steps:
- Communities detection
- Communities are dense regions in the network which have the potential of being conserved functional modules. CFinder is used to identify the communities in the input networks as the starting point for mapping two networks.
- Community mapping
- The mapping of communities from two input network is performed by Hungarian method, and assignment optimization problem solving algorithm. The communities are mapped using only sequence and function similarity of the proteins in the communities. This steps results in a list of seeds proteins with high sequence and function similarity.
- Extension mapping
- The initial seed mappings are extended to first-neighbouring proteins of those in the core. In this extension mapping, network topology is included in the protein similarity score to increase the possibility of aligning neighbouring pairs, boosting the number of conserved interactions and detecting similar modules in the conserved network.
- Bader,G.D and Hogue, C.W. (2003) An automated method for finding molecular complexes in large protein networks. BMC Bioinformatics, 4, 2.
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 33: D39-D45.
- Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, et al. (2004) Annotation Transfer Between Genomes: Protein.Protein Interologs and Protein.DNA Regulogs. Genome Research 14: 1107-1118.
- Palla G, Der nyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435: 814-818.
- Kuhn HW (2005) The Hungarian method for the assignment problem. Naval Research Logistics (NRL) 52: 7-21.