Question

虽然在熊猫中绘制groupby对象是直截了当且容易的，但我想知道从groupby对象中获取唯一组的最pythonic（pandastic？）方式是什么。例如：我正在处理大气数据并尝试绘制几天或更长时间内的昼夜趋势。以下是包含许多天数据的DataFrame，其中时间戳是索引：

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10909 entries, 2013-08-04 12:01:00 to 2013-08-13 17:43:00
Data columns (total 17 columns):
Date     10909  non-null values
Flags    10909  non-null values
Time     10909  non-null values
convt    10909  non-null values
hino     10909  non-null values
hinox    10909  non-null values
intt     10909  non-null values
no       10909  non-null values
nox      10909  non-null values
ozonf    10909  non-null values
pmtt     10909  non-null values
pmtv     10909  non-null values
pres     10909  non-null values
rctt     10909  non-null values
smplf    10909  non-null values
stamp    10909  non-null values
no2      10909  non-null values
dtypes: datetime64[ns](1), float64(11), int64(2), object(3)

为了能够在几分钟内每分钟对数据进行平均（并采用其他统计数据），我将数据帧分组： data = no.groupby('Time')

然后，我可以轻松地绘制平均NO浓度以及四分位数：

ax = figure(figsize=(12,8)).add_subplot(111)
title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
ylabel('Concentration [ppb]')
data.no.mean().plot(ax=ax, style='b', label='Mean')
data.no.apply(lambda x: percentile(x, 25)).plot(ax=ax, style='r', label='25%')
data.no.apply(lambda x: percentile(x, 75)).plot(ax=ax, style='r', label='75%')

引发我的问题的问题是，为了使用像fill_between()这样的情节绘制更有趣的外观，有必要根据文档了解x轴信息

fill_between(x, y1, y2=0, where=None, interpolate=False, hold=None, **kwargs)

对于我的生活，我无法找到实现这一目标的最佳方法。我试过了：

迭代groupby对象并创建组数组
从原始DataFrame中抓取所有唯一的时间条目

我可以做这些工作，但我知道有更好的方法。 Python太漂亮了。任何想法/提示？

更新可以使用unstack()（例如

no_new = no.groupby('Time')['no'].describe().unstack()
no_new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1440 entries, 00:00 to 23:59
Data columns (total 8 columns):
count    1440  non-null values
mean     1440  non-null values
std      1440  non-null values
min      1440  non-null values
25%      1440  non-null values
50%      1440  non-null values
75%      1440  non-null values
max      1440  non-null values
dtypes: float64(8)

虽然我应该可以使用fill_between()与no_new.index进行联系，但我会收到TypeError。

当前的绘图代码和TypeError：

ax = figure(figzise=(12,8)).add_subplot(111)
ax.plot(no_new['mean'])
ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')

类型错误：

TypeError                                 Traceback (most recent call last)
<ipython-input-6-47493de920f1> in <module>()
      2 ax = figure(figsize=(12,8)).add_subplot(111)
      3 ax.plot(no_new['mean'])
----> 4 ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5,     facecolor='green')
      5 #title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
      6 #ylabel('Concentration [ppb]')

C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
   6986 
   6987         # Convert the arrays so we can work with them
-> 6988         x = ma.masked_invalid(self.convert_xunits(x))
   6989         y1 = ma.masked_invalid(self.convert_yunits(y1))
   6990         y2 = ma.masked_invalid(self.convert_yunits(y2))

C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\ma\core.pyc in masked_invalid(a, copy)
   2237         cls = type(a)
   2238     else:
-> 2239         condition = ~(np.isfinite(a))
   2240         cls = MaskedArray
   2241     result = a.view(cls)

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

截至目前的情节如下： enter image description here

Answer 1

将groupby统计信息（平均值/ 25/75）存储为新数据框中的列，然后传递新数据框的index作为x的{{1}}参数对我有用（已测试与matplotlib 1.3.1）。如，

plt.fill_between()

gdf = df.groupby('Time')[col].describe().unstack() plt.fill_between(gdf.index, gdf['25%'], gdf['75%'], alpha=.5)应如下所示：

gdf.info()

更新：要解决<class 'pandas.core.frame.DataFrame'> Index: 12 entries, 00:00:00 to 22:00:00 Data columns (total 8 columns): count 12 non-null float64 mean 12 non-null float64 std 12 non-null float64 min 12 non-null float64 25% 12 non-null float64 50% 12 non-null float64 75% 12 non-null float64 max 12 non-null float64 dtypes: float64(8)异常，必须先将TypeError: ufunc 'isfinite' not supported列从“HH：MM”格式的一系列字符串对象转换为一系列Time对象，可以按如下方式完成：

datetime.time

在Pandas / Python中使用GroupBy进行绘图

1 个答案: