我正在为包含1000-15000个基因的表达矩阵生成热图,我只对高表达基因的子集感兴趣。有没有办法只显示具有较高z值的标签?
现在,我只是为x轴设置一个非常小的字体,但这不是可扩展的解决方案。
示例代码:
import numpy as np
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
from functools import reduce
df = pd.DataFrame(np.random.randn(200, 4), columns=['cell_1', 'cell_2', 'cell_3', 'cell_4',])
idx=([f'Gene {i}' for i in range(0, 200)])
df['gene'] = idx
df.set_index('gene', inplace=True)
g = sns.clustermap(df.transpose(), method='average', metric='correlation', z_score=0, figsize=(15,15), xticklabels=True)
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xmajorticklabels(), fontsize = 5)
Example heatmap with too many gene labels
我想获得一个更具可读性的x轴,其中仅显示具有高z值的基因的标签。
谢谢!
答案 0 :(得分:1)
这是一次快速而肮脏的尝试。 您可以使其更清洁或具有更好的性能,但您会明白的。
最好看看at this matplotlib doc link,它说明了如何对热图进行自定义注释,并且可能有用。
您还可以考虑先处理数据框,然后仅绘制相关数据(数据框“过滤”)
结果:
代码:
import numpy as np
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
from functools import reduce
total_genes = 50
df = pd.DataFrame(np.random.randn(total_genes, 4), columns=['cell_1', 'cell_2', 'cell_3', 'cell_4',])
idx=([f'Gene {i}' for i in range(0, total_genes)])
df['gene'] = idx
df.set_index('gene', inplace=True)
transposed = df.transpose()
# print(transposed)
g = sns.clustermap(transposed, method='average', metric='correlation', z_score=0, figsize=(15,15), xticklabels=True)
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xmajorticklabels())#, fontsize = 5)
# tmp = g.ax_heatmap.get_xaxis()
threshold = 1.2
x_labels_ticks = g.ax_heatmap.get_xticklabels()
total_genes_above_threshold = 0
for i, xtickdata in enumerate(x_labels_ticks):
gene = xtickdata._text
if transposed[gene].max() >= threshold:
# print("gene {} has at least one value > {}".format(xtickdata, threshold))
# print(transposed[gene])
# print("#########")
total_genes_above_threshold = total_genes_above_threshold + 1
else:
xtickdata._text = ''
print("total_genes_above_threshold {}".format(total_genes_above_threshold))
# re set the tick labels with the modified list
g.ax_heatmap.set_xticklabels(x_labels_ticks)