我有一个csv文件,我试图绘制每月一些值的平均值。我的csv文件的结构如下所示,所以我认为我应该每天对数据进行分组,然后按月分组,以便计算平均值。
bean
我正在使用此代码:
timestamp,heure,lat,lon,impact,type
2007-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1
2007-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1
2007-01-02 00:00:00,23:01:03,35.0617,-1.435,-17.1,2
2007-01-03 00:00:00,01:14:29,36.5685,0.9043,36.8,1
2007-01-03 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1
但是,我一直遇到这样的错误:
KeyError:'找不到石斑鱼名称时间戳'
任何想法??
答案 0 :(得分:1)
您收到此错误是因为您已将timestamp
列设置为index
。尝试从key='timestamp'
或TimeGrouper()
方法移除set_index
,并按预期进行分组:
daily = df.set_index('timestamp').groupby(pd.TimeGrouper(freq='D', axis=1), axis=1)['impact'].count()
或
daily = df.groupby(pd.TimeGrouper(key='timestamp', freq='D', axis=1), axis=1)['impact'].count()
答案 1 :(得分:1)
我相信你需要DataFrame.resample
。
还需要通过read_csv
中的参数timestamp
和DataTimeindex
将parse_dates
转换为index_col
。
names =["timestamp","heure","lat","lon","impact","type"]
data = pd.read_csv('fou.txt',names=names, parse_dates=['timestamp'],index_col=['timestamp'])
print (data.head())
#your code
daily = data.groupby(pd.TimeGrouper(freq='D'))['impact'].count()
monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean()
ax = monthly.plot(kind='bar')
plt.show()
#more simpliest
daily = data.resample('D')['impact'].count()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()
同时检查是否真的需要count
,而不是size
。
What is the difference between size and count in pandas?
daily = data.resample('D')['impact'].size()
monthly = daily.resample('M').mean()
ax = monthly.plot(kind='bar')
plt.show()