来自pandas DataFrame的直方图

时间:2015-07-29 01:04:51

标签: python numpy pandas matplotlib

数据

下面是我希望表示为直方图的数据框,每行作为一个点。这不会有趣,因为这将给我三个相同大小的箱子。现在还可以,所以请继续阅读!

>>> outer_df
  patient                         cell  product
0   Pat_1               22RV1_PROSTATE       12
1   Pat_1               DU145_PROSTATE       15
2   Pat_1  LN18_CENTRAL_NERVOUS_SYSTEM        9
3   Pat_2               22RV1_PROSTATE       12
4   Pat_2               DU145_PROSTATE       15
5   Pat_2  LN18_CENTRAL_NERVOUS_SYSTEM        9
6   Pat_3               22RV1_PROSTATE       12
7   Pat_3               DU145_PROSTATE       15
8   Pat_3  LN18_CENTRAL_NERVOUS_SYSTEM        9

期望结果

将每一行描绘为直方图上的一个点,但也能够挑选出一组特定数据(例如,所有单元格中的所有点都是紫色,属于DU145_PROSTATE的那些点将在红色,并且22RV1_PROSTATE为蓝色)并将其绘制为叠加的直方图。我用pandas docs

中的图片说明了这一点

Overlaid histogram, with three distributions (I only need 2)

尝试1

我首先尝试对DataFrames使用hist方法,但遇到了一个错误,以及一个空白的4x4系列直方图。

>>> outer_df.hist()
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1977, in hist_frame
    ax.hist(data[col].dropna().values, **kwds)
  File "/usr/lib/python3/dist-packages/matplotlib/axes.py", line 8099, in hist
    xmin = min(xmin, xi.min())
TypeError: unorderable types: str() < float()

尝试2

实现DataFrame.hist()&#34;绘制多个子图上列的直方图&#34;,远离此并尝试outer_df.plot(kind='hist', stacked=True)。即使我直接从文档中获取了这些内容,我仍然坚持这个错误:

>>> outer_df.plot(kind='hist', stacked=True)
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
    raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist

尝试3 - 由@ 816提供

>>> outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)
Traceback (most recent call last):
  File "/usr/lib/python3.3/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/tools/plotting.py", line 1612, in plot_frame
    raise ValueError('Invalid chart type given %s' % kind)
ValueError: Invalid chart type given hist

2 个答案:

答案 0 :(得分:0)

怎么样:

outer_df.set_index(['patient', 'cell']).unstack('cell').plot(kind='hist', stacked=True)

答案 1 :(得分:0)

使用groupby方法如何:

hist_data = { cell: outer_df.ix[inds,'product'] for cell,inds in outer_df.groupby('cell').groups.iteritems() }

dict中的每个值都是一个Series,对应于单元格组。接下来,迭代单元格组,每次绘制直方图:

for cell in hist_data:
    hist_data[cell].hist(label=cell)
#pylab.legend() # need to call this to make sure the legend shows