Question

我可以使用以下代码从多索引数据框生成水平堆叠条：

arrays = [np.array(['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux', 'qux']),
          np.array(['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve'])]

s = abs(pd.Series(np.random.randn(12), index=arrays))

ax = s.unstack(level=1).plot.barh(stacked=True, colormap='Paired')

plt.show()

此输出

但我希望每个栏上的最大段（不论类别）总是出现在栏的底部（ i 。 e 。在左边）。我还没有找到barh()的任何参数来完成这个技巧，而在0级排序s对于unstack来说并没有帮助。

Answer 1

您可以直接使用matplotlib.barh：

barh(range(4), ax.sum(axis=1), color=['blue' if one else 'green' for one in ax.one == ax.max(axis=1)]);
barh(range(4), ax.max(axis=1), color=['green' if one else 'blue' for one in ax.one == ax.max(axis=1)]);

当然，您可以使用yticks等来更好地打勾和标签。

修改

对于一般情况，这里概述了如何扩展事物。

首先，从
开始
d = s.unstack(level=1).as_matrix()

现在迭代到np.nansum(d) == 0。

对于每次迭代，条形的长度应为

np.nansum(d, axis=1)

要绘制颜色，可以使用

np.nanargmin(d, axis=1)

（您需要将这些数字映射到颜色）。在每次迭代结束时，使用

d[:, np.nanargmin(d, axis=1)] = np.nan

这将在较长的条形图上绘制较短的条形图，给出了叠条的错觉。

Answer 2

由于数据帧非常稀疏，即每列只有一个值，因此您可以按该值对列进行排序。

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt

arrays = [np.array(['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux', 'qux']),
          np.array(['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve'])]

s = abs(pd.Series(np.random.randn(12), index=arrays))
df = s.unstack(level=1)
df = df[df.columns[np.argsort(df.sum())[::-1]]]
ax = df.plot.barh(stacked=True, colormap='Paired')

plt.show()

pandas（水平）堆叠条，每条段分类

2 个答案: