PHUNKEE

Pairing subgrapHs Using NetworK Environment Equivalence


Hungarian Method






The Hungarian method was used to reduce the many-to-many sequence similarity relationships between proteins in the two networks to the optimal one-to-one node matching. For each possible protein matching a score based on the logarithm of the BLAST E-value is allocated. The Hungarian method locates the set of one-to-one node matches that optimise the overall score in a manner similar to that of the example shown below.

A worked example

To demonstrate how the Hungarian method works, we consider a simple example in which we want to find the optimum match between four Blue nodes and four Red nodes. The score for each potential match is shown in the table below. A lower score indicates a more favourable match, so we want to find a set of node matches that minimises the overall score.

Red1Red2Red3Red4
Blue1817323
Blue23941120
Blue3132416
Blue422892

Step 1

First for each row we subtract the row minimum from the rest of the row.


Red1Red2Red3Red4
Blue1817323-3
Blue23941120-4
Blue3132416-2
Blue422892-2

Step 2

Then for each column we subtract the column minimum from the rest of the column.


Red1Red2Red3Red4
Blue1514020
Blue2350716
Blue3110394
Blue420670
-5-0-0-0

Step 3

Cover the minimum number of rows or columns required to cover all of the zeros in the matrix (shown in light green). If the number of rows or columns required is equal to the number of rows/columns in the matrix then we are finished. Otherwise, we have to proceed to the next step. In this case, we can cover all zeroes with only three lines so we continue.


Red1Red2Red3Red4
Blue1014020
Blue2300716
Blue360394
Blue415670

Step 4

From the above matrix, we first have to find the minimum value not covered (Blue3-Red1 - 6). This value is then subtracted from each uncovered entry and added to each entry covered by both a vertical and horizontal line (Blue1-Red2 and Blue1-Red4) to give the matrix shown below. Step 3 is then repeated. This continues until four rows or columns are needed to cover all zeroes and we are finished.

In this case, the matrix shown below is finished. The zeroes in the matrix determine the optimal matching between nodes. The only possible matching is shown by the zeroes in the purple cells. The final node matches are listed below.


Red1Red2Red3Red4
Blue1020026
Blue2240116
Blue300334
Blue49610

Node matching assignments

Blue1 = Red3
Blue2 = Red2
Blue3 = Red1
Blue4 = Red4




Contact

Structural Bioinformatics Group
Division of Molecular Biosciences
Imperial College, London