- In the last lab you learned how to group genes into clusters based on similar expression patterns.
- In this lab we extend this concept to build gene networks
- Gene networks are graphs that show connections between genes with similar expression.
First calculate the correlation between each gene’s expression across samples
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | 1.00 | 0.12 | 0.75 | 0.86 | 0.49 | 0.32 |
GeneB | 0.12 | 1.00 | 0.92 | 0.08 | 0.88 | 0.08 |
GeneC | 0.75 | 0.92 | 1.00 | 0.81 | 0.78 | 0.02 |
GeneD | 0.86 | 0.08 | 0.81 | 1.00 | 0.28 | 0.59 |
GeneE | 0.49 | 0.88 | 0.78 | 0.28 | 1.00 | 0.78 |
GeneF | 0.32 | 0.08 | 0.02 | 0.59 | 0.78 | 1.00 |
Then create an adjacency matrix with “1” indicating genes that are correlated above a threshold, and “0” indicating below threshold.
Connect genes with a “1”
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | 0 | 0 | 1 | 1 | 0 | 0 |
GeneB | 0 | 0 | 1 | 0 | 1 | 0 |
GeneC | 1 | 1 | 0 | 1 | 1 | 0 |
GeneD | 1 | 0 | 1 | 0 | 0 | 0 |
GeneE | 0 | 1 | 1 | 0 | 0 | 1 |
GeneF | 0 | 0 | 0 | 0 | 1 | 0 |
One problem with correlation networks is that it is hard to know what threshold to pick. Further, correlation values can be affected by “noise” in the experiment, that may not be relevant.
An alternative (and in my hands better) approach is to use Mutual Ranks. We can connect genes with the the highest correlations, regardless of their precise value.
Here, we:
First calculate the correlation between each gene’s expression across samples
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | 1.00 | 0.12 | 0.75 | 0.86 | 0.49 | 0.32 |
GeneB | 0.12 | 1.00 | 0.92 | 0.08 | 0.88 | 0.08 |
GeneC | 0.75 | 0.92 | 1.00 | 0.81 | 0.78 | 0.02 |
GeneD | 0.86 | 0.08 | 0.81 | 1.00 | 0.28 | 0.59 |
GeneE | 0.49 | 0.88 | 0.78 | 0.28 | 1.00 | 0.78 |
GeneF | 0.32 | 0.08 | 0.02 | 0.59 | 0.78 | 1.00 |
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | NA | 3 | 4 | 1 | 4 | 3 |
GeneB | 5 | NA | 1 | 5 | 1 | 4 |
GeneC | 2 | 1 | NA | 2 | 2 | 5 |
GeneD | 1 | 4 | 2 | NA | 5 | 2 |
GeneE | 3 | 2 | 3 | 4 | NA | 1 |
GeneF | 4 | 4 | 5 | 3 | 2 | NA |
Geometric average of \(x\) and \(y\): \(\sqrt{x*y}\)
x | y | arith_mean | geom_mean |
---|---|---|---|
1 | 1 | 1.0 | 1.00 |
1 | 10 | 5.5 | 3.16 |
3 | 2 | 2.5 | 2.45 |
3 | 10 | 6.5 | 5.48 |
5 | 3 | 4.0 | 3.87 |
5 | 20 | 12.5 | 10.00 |
100 | 1 | 50.5 | 10.00 |
100 | 2 | 51.0 | 14.14 |
100 | 20 | 60.0 | 44.72 |
When \(x\) and \(y\) are different, the geometric mean weights the smaller numbers more heavily.
Geometric average of \(x\) and \(y\)
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | NA | 3.87 | 2.83 | 1.00 | 3.46 | 3.46 |
GeneB | 3.87 | NA | 1.00 | 4.74 | 1.41 | 4.24 |
GeneC | 2.83 | 1.00 | NA | 2.00 | 2.74 | 5.00 |
GeneD | 1.00 | 4.74 | 2.00 | NA | 4.47 | 2.45 |
GeneE | 3.46 | 1.41 | 2.74 | 4.47 | NA | 1.58 |
GeneF | 3.46 | 4.24 | 5.00 | 2.45 | 1.58 | NA |
Mutual Rank <= 3
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | 0 | 0 | 1 | 1 | 0 | 0 |
GeneB | 0 | 0 | 1 | 0 | 1 | 0 |
GeneC | 1 | 1 | 0 | 1 | 1 | 0 |
GeneD | 1 | 0 | 1 | 0 | 0 | 1 |
GeneE | 0 | 1 | 1 | 0 | 0 | 1 |
GeneF | 0 | 0 | 0 | 1 | 1 | 0 |
GeneA | GeneB | GeneC | GeneD | GeneE | GeneF | |
---|---|---|---|---|---|---|
GeneA | 0 | 0 | 1 | 1 | 0 | 0 |
GeneB | 0 | 0 | 1 | 0 | 1 | 0 |
GeneC | 1 | 1 | 0 | 1 | 1 | 0 |
GeneD | 1 | 0 | 1 | 0 | 0 | 1 |
GeneE | 0 | 1 | 1 | 0 | 0 | 1 |
GeneF | 0 | 0 | 0 | 1 | 1 | 0 |
Correlation and mutual rank networks easy to make and easy to understand but have some limitations