沿着簇图的特定轴添加带有索引特定数据的条形图

时间:2019-02-20 14:20:31

标签: python matplotlib seaborn heatmap

我正在尝试为seaborn clustermap中的每一行添加条形图(堆叠或其他形式)。

假设我有一个这样的数据框:

import pandas as pd
import numpy as np
import random

df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_1","barPlot_1"])

df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
df.head()
       heatMap_1    heatMap_2   heatMap_3   heatMap_4   heatMap_5   barPlot_1   barPlot_1   barPlot_1
index                               
4552288 9   3   54  37  23  42  94  31
6915023 7   47  59  92  70  96  39  59
2988122 91  29  59  79  68  64  55  5
5060540 68  80  25  95  80  58  72  57
2901025 86  63  36  8   33  17  79  86

我可以使用前5列(在此示例中,以前缀heatmap_开头)使用以下(或与seaborn等效的文件)创建seaborn集群图:

sns.clustermap(df.iloc[:,0:5], )

以及最后四列的堆叠条形图(在此示例中,以前缀barPlot_开头)使用以下命令: df.iloc[:,5:8].plot(kind='bar', stacked=True)

但是我对如何合并两种绘图类型有些困惑。我知道clustermap会创建自己的图形,而且我不确定是否可以从clustermap中仅提取热图,然后将其与子图一起使用。 (在此处讨论:Adding seaborn clustermap to figure with other plots)。这将产生一个奇怪的输出。 编辑: 使用这个:

import pandas as pd
import numpy as np
import random
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import matplotlib.gridspec


df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_2","barPlot_3"])
df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
g = sns.clustermap(df.iloc[:,0:5], )
g.gs.update(left=0.05, right=0.45)
gs2 = matplotlib.gridspec.GridSpec(1,1, left=0.6)
ax2 = g.fig.add_subplot(gs2[0])
df.iloc[:,5:8].plot(kind='barh', stacked=True, ax=ax2)

创建此: enter image description here

不能很好地匹配(即由于树状图而发生了变化)。

另一种选择是手动执行聚类并创建matplotlib热图,然后添加相关的子图,如条形图(在此处讨论:How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

有没有一种方法可以将clustermap与其他图一起用作子图?

这是我正在寻找的结果[1]enter image description here

1 个答案:

答案 0 :(得分:0)

虽然答案不正确,但我决定将其分解并手动执行所有操作。 从答案here中汲取灵感,我决定分别对热图进行聚类和重新排序:

def heatMapCluter(df):
    row_method = "ward"
    column_method = "ward"
    row_metric = "euclidean"
    column_metric = "euclidean"

    if column_method == "ward":
        d2 = dist.pdist(df.transpose())
        D2 = dist.squareform(d2)
        Y2 = sch.linkage(D2, method=column_method, metric=column_metric)
        Z2 = sch.dendrogram(Y2, no_plot=True)
        ind2 = sch.fcluster(Y2, 0.7 * max(Y2[:, 2]), "distance")
        idx2 = Z2["leaves"]
        df = df.iloc[:, idx2]
        ind2 = ind2[idx2]
    else:
        idx2 = range(df.shape[1])

    if row_method:
        d1 = dist.pdist(df)
        D1 = dist.squareform(d1)
        Y1 = sch.linkage(D1, method=row_method, metric=row_metric)
        Z1 = sch.dendrogram(Y1, orientation="right", no_plot=True)
        ind1 = sch.fcluster(Y1, 0.7 * max(Y1[:, 2]), "distance")
        idx1 = Z1["leaves"]
        df = df.iloc[idx1, :]
        ind1 = ind1[idx1]
    else:
        idx1 = range(df.shape[0])
    return df

重新排列了原始数据框:

clusteredHeatmap = heatMapCluter(df.iloc[:, 0:5].copy())
# Extract the "barplot" rows and merge them
clusteredDataframe = df.reindex(list(clusteredHeatmap.index.values))
clusteredDataframe = clusteredDataframe.reindex(
    list(clusteredHeatmap.columns.values)
    + list(df.iloc[:, 5:8].columns.values),
    axis=1,
)

,然后使用gridspec绘制两个“子图”(群集图和条形图):

# Now let's plot this - first the heatmap and then the barplot.
# Since it is a "two" part plot which shares the same axis, it is
# better to use gridspec
fig = plt.figure(figsize=(12, 12))
gs = GridSpec(3, 3)
gs.update(wspace=0.015, hspace=0.05)
ax_main = plt.subplot(gs[0:3, :2])
ax_yDist = plt.subplot(gs[0:3, 2], sharey=ax_main)
im = ax_main.imshow(
    clusteredDataframe.iloc[:, 0:5],
    cmap="Greens",
    interpolation="nearest",
    aspect="auto",
)
clusteredDataframe.iloc[:, 5:8].plot(
    kind="barh", stacked=True, ax=ax_yDist, sharey=True
)

ax_yDist.spines["right"].set_color("none")
ax_yDist.spines["top"].set_color("none")
ax_yDist.spines["left"].set_visible(False)
ax_yDist.xaxis.set_ticks_position("bottom")


ax_yDist.set_xlim([0, 100])
ax_yDist.set_yticks([])
ax_yDist.xaxis.grid(False)
ax_yDist.yaxis.grid(False)

Jupyter笔记本:https://gist.github.com/siddharthst/2a8b7028d18935860062ac7379b9279f

图片: enter image description here

1-http://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/