在python中按时间分组和绘制数据

时间:2017-06-01 11:49:44

标签: python-2.7 pandas pandas-groupby

我有一个csv文件,我试图绘制每月一些值的平均值。我的csv文件的结构如下所示,所以我认为我应该每天对数据进行分组,然后按月分组,以便计算平均值。

bean

我正在使用此代码:

timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

但是,我一直遇到这样的错误:

  

KeyError:'找不到石斑鱼名称时间戳'

任何想法??

2 个答案:

答案 0 :(得分:1)

您收到此错误是因为您已将timestamp列设置为index。尝试从key='timestamp'TimeGrouper()方法移除set_index,并按预期进行分组:

daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()

daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()

答案 1 :(得分:1)

我相信你需要DataFrame.resample

还需要通过read_csv中的参数timestampDataTimeindexparse_dates转换为index_col

names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())

#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()

#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()

graph

同时检查是否真的需要count,而不是sizeWhat is the difference between size and count in pandas?

daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()