Question

我有一个时间采样数据集，基本上有一个双列索引（时间戳，ID）。但是，某些时间戳没有给定索引的采样点。

如何使用Matplotlib为这种数据制作堆栈图？

import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

df = pd.read_csv(io.StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))

b = np.unique(df.B)
plt.stackplot(np.unique(df.A),
              [df[df.B==_b].C for _b in b],
              labels=['B:{0}'.format(_b) for _b in b],
)
plt.xlabel('A')
plt.ylabel('C')
plt.legend(loc='upper left')
plt.show()

当我尝试这个程序时，Python会回复：

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

当我手动填写缺失的数据点时（请参阅字符串文字中的空白行），该图表工作正常。

是否有一种直接的方式来＆＃34;插入＆＃34;缺少样本数据的零记录（如this question，但我有两列作为索引，我不知道如何使解决方案适应我的问题）或者Matplotlib是否有漏洞？

Answer 1

您可以使用df.pivot将DataFrame按到一个适合调用DataFrame.plot(kind='area')的表单中。例如，如果

In [46]: df
Out[46]: 
   A  B     C
0  1  1  0.00
1  1  2  0.00
2  1  3  0.00
3  1  4  0.00
4  2  1  0.50
5  2  2  0.20
6  2  4  0.15
7  3  1  0.70
8  3  3  0.10
9  3  4  0.20

然后

In [47]: df.pivot(columns='B', index='A')
Out[47]: 
     C                
B    1    2    3     4
A                     
1  0.0  0.0  0.0  0.00
2  0.5  0.2  NaN  0.15
3  0.7  NaN  0.1  0.20

请注意，df.pivot会为您填写缺少的NaN值。现在，使用这种形式的DataFrame，

result.plot(kind='area')

产生所需的情节。

import pandas as pd
import numpy as np
import io
import matplotlib.pyplot as plt

try:
    # for Python2
    from cStringIO import StringIO 
except ImportError:
    # for Python3
    from io import StringIO


df = pd.read_csv(StringIO('''
A,B,C
1,1,0
1,2,0
1,3,0
1,4,0
2,1,.5
2,2,.2

2,4,.15
3,1,.7

3,3,.1
3,4,.2
'''.strip()))


result = df.pivot(columns='B', index='A')
result.columns = result.columns.droplevel(0)
# Alternatively, the above two lines are equivalent to
# result = df.set_index(['A','B'])['C'].unstack('B')

ax = result.plot(kind='area')
lines, labels = ax.get_legend_handles_labels()
ax.set_ylabel('C')
ax.legend(lines, ['B:{0}'.format(b) for b in result.columns], loc='best')

plt.show()

产量

如何使用稀疏数据创建Matplotlib堆栈图？

1 个答案: