虽然在熊猫中绘制groupby对象是直截了当且容易的,但我想知道从groupby对象中获取唯一组的最pythonic(pandastic?)方式是什么。例如: 我正在处理大气数据并尝试绘制几天或更长时间内的昼夜趋势。以下是包含许多天数据的DataFrame,其中时间戳是索引:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10909 entries, 2013-08-04 12:01:00 to 2013-08-13 17:43:00
Data columns (total 17 columns):
Date 10909 non-null values
Flags 10909 non-null values
Time 10909 non-null values
convt 10909 non-null values
hino 10909 non-null values
hinox 10909 non-null values
intt 10909 non-null values
no 10909 non-null values
nox 10909 non-null values
ozonf 10909 non-null values
pmtt 10909 non-null values
pmtv 10909 non-null values
pres 10909 non-null values
rctt 10909 non-null values
smplf 10909 non-null values
stamp 10909 non-null values
no2 10909 non-null values
dtypes: datetime64[ns](1), float64(11), int64(2), object(3)
为了能够在几分钟内每分钟对数据进行平均(并采用其他统计数据),我将数据帧分组:
data = no.groupby('Time')
然后,我可以轻松地绘制平均NO浓度以及四分位数:
ax = figure(figsize=(12,8)).add_subplot(111)
title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
ylabel('Concentration [ppb]')
data.no.mean().plot(ax=ax, style='b', label='Mean')
data.no.apply(lambda x: percentile(x, 25)).plot(ax=ax, style='r', label='25%')
data.no.apply(lambda x: percentile(x, 75)).plot(ax=ax, style='r', label='75%')
引发我的问题的问题是,为了使用像fill_between()
这样的情节绘制更有趣的外观,有必要根据文档了解x轴信息
fill_between(x, y1, y2=0, where=None, interpolate=False, hold=None, **kwargs)
对于我的生活,我无法找到实现这一目标的最佳方法。我试过了:
我可以做这些工作,但我知道有更好的方法。 Python太漂亮了。任何想法/提示?
更新
可以使用unstack()
(例如
no_new = no.groupby('Time')['no'].describe().unstack()
no_new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1440 entries, 00:00 to 23:59
Data columns (total 8 columns):
count 1440 non-null values
mean 1440 non-null values
std 1440 non-null values
min 1440 non-null values
25% 1440 non-null values
50% 1440 non-null values
75% 1440 non-null values
max 1440 non-null values
dtypes: float64(8)
虽然我应该可以使用fill_between()
与no_new.index
进行联系,但我会收到TypeError
。
当前的绘图代码和TypeError
:
ax = figure(figzise=(12,8)).add_subplot(111)
ax.plot(no_new['mean'])
ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')
类型错误:
TypeError Traceback (most recent call last)
<ipython-input-6-47493de920f1> in <module>()
2 ax = figure(figsize=(12,8)).add_subplot(111)
3 ax.plot(no_new['mean'])
----> 4 ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')
5 #title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
6 #ylabel('Concentration [ppb]')
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
6986
6987 # Convert the arrays so we can work with them
-> 6988 x = ma.masked_invalid(self.convert_xunits(x))
6989 y1 = ma.masked_invalid(self.convert_yunits(y1))
6990 y2 = ma.masked_invalid(self.convert_yunits(y2))
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\ma\core.pyc in masked_invalid(a, copy)
2237 cls = type(a)
2238 else:
-> 2239 condition = ~(np.isfinite(a))
2240 cls = MaskedArray
2241 result = a.view(cls)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
截至目前的情节如下:
答案 0 :(得分:5)
将groupby统计信息(平均值/ 25/75)存储为新数据框中的列,然后传递新数据框的index
作为x
的{{1}}参数对我有用(已测试与matplotlib 1.3.1)。如,
plt.fill_between()
gdf = df.groupby('Time')[col].describe().unstack()
plt.fill_between(gdf.index, gdf['25%'], gdf['75%'], alpha=.5)
应如下所示:
gdf.info()
更新:要解决<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 00:00:00 to 22:00:00
Data columns (total 8 columns):
count 12 non-null float64
mean 12 non-null float64
std 12 non-null float64
min 12 non-null float64
25% 12 non-null float64
50% 12 non-null float64
75% 12 non-null float64
max 12 non-null float64
dtypes: float64(8)
异常,必须先将TypeError: ufunc 'isfinite' not supported
列从“HH:MM”格式的一系列字符串对象转换为一系列Time
对象,可以按如下方式完成:
datetime.time