Seaborn簇图仅显示具有高z值的基因的轴标签

时间:2019-10-22 16:32:47

标签: python seaborn

我正在为包含1000-15000个基因的表达矩阵生成热图,我只对高表达基因的子集感兴趣。有没有办法只显示具有较高z值的标签?

现在,我只是为x轴设置一个非常小的字体,但这不是可扩展的解决方案。

示例代码:

import numpy as np
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
from functools import reduce

df = pd.DataFrame(np.random.randn(200, 4), columns=['cell_1', 'cell_2', 'cell_3', 'cell_4',])

idx=([f'Gene {i}' for i in range(0, 200)])

df['gene'] = idx
df.set_index('gene', inplace=True)
g = sns.clustermap(df.transpose(), method='average', metric='correlation', z_score=0, figsize=(15,15), xticklabels=True)
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xmajorticklabels(), fontsize = 5)

Example heatmap with too many gene labels

我想获得一个更具可读性的x轴,其中仅显示具有高z值的基因的标签。

谢谢!

1 个答案:

答案 0 :(得分:1)

这是一次快速而肮脏的尝试。 您可以使其更清洁或具有更好的性能,但您会明白的。

最好看看at this matplotlib doc link,它说明了如何对热图进行自定义注释,并且可能有用。

您还可以考虑先处理数据框,然后仅绘制相关数据(数据框“过滤”)

结果:

enter image description here

代码:

import numpy as np
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
from functools import reduce

total_genes = 50
df = pd.DataFrame(np.random.randn(total_genes, 4), columns=['cell_1', 'cell_2', 'cell_3', 'cell_4',])

idx=([f'Gene {i}' for i in range(0, total_genes)])

df['gene'] = idx
df.set_index('gene', inplace=True)
transposed = df.transpose()
# print(transposed)
g = sns.clustermap(transposed, method='average', metric='correlation', z_score=0, figsize=(15,15), xticklabels=True)
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xmajorticklabels())#, fontsize = 5)

# tmp = g.ax_heatmap.get_xaxis()

threshold = 1.2
x_labels_ticks = g.ax_heatmap.get_xticklabels()

total_genes_above_threshold = 0
for i, xtickdata in enumerate(x_labels_ticks):
    gene = xtickdata._text
    if transposed[gene].max() >= threshold:
        # print("gene {} has at least one value > {}".format(xtickdata, threshold))
        # print(transposed[gene])
        # print("#########")
        total_genes_above_threshold = total_genes_above_threshold + 1
    else:
        xtickdata._text = ''

print("total_genes_above_threshold {}".format(total_genes_above_threshold))

# re set the tick labels with the modified list
g.ax_heatmap.set_xticklabels(x_labels_ticks)