我有一个这样的数据框:
"16-feb-05 07:10", 6.8, 8.1, 0.0214, 1.8105, 0.0214, 1.6985, 1.00, 2.631
"16-feb-05 07:15", 18.3, 8.1, 0.0214, 1.8093, 0.0214, 1.6977, 1.00, 2.656
"16-feb-05 07:20", 12.7, 8.1, 0.0214, 1.8083, 0.0214, 1.6971, 1.00, 2.673
...
...
"01-mar-05 00:00", -10.1, 7.9, 0.0214, 1.3718, 0.0214, 1.6761, 1.00,29.419
"01-mar-05 00:05", 5.1, 7.9, 0.0214, 1.3722, 0.0214, 1.6767, 1.00,29.425
"01-mar-05 00:10", -3.4, 7.9, 0.0214, 1.3728, 0.0214, 1.6774, 1.00,29.421
...然后数据每小时转一次
"02-dec-06 13:00", -2.8, 7.5, 0.0214, 1.0499, 0.0214, 1.5777, 1.00,46.429
"02-dec-06 14:00", 3.4, 7.5, 0.0214, 1.0488, 0.0214, 1.5767, 1.00,46.482
我想平均每隔5分钟取第二列,但忽略其余部分。
我试过了:
names=['Date','Conc','Flow','SZ','SB','RZ','RB','Fraction','Attenuation']
px_all=pd.read_csv('Output1.csv',parse_dates=True,index_col=0,names=names)
close_px=px_all[['Conc']] #So was only concerned with the one column
close_px.resample('5min',how='sum')
然后它说' Conc'不是指数。 有人提出建议吗,提前谢谢!
答案 0 :(得分:0)
对我来说它很好用:
#pandas 0.18.0
#df1 = close_px[['Conc']].resample('5min').sum()
#pandas bellow 0.18.0
df1 = close_px[['Conc']].resample('5min',how='sum')
print (df1)
Conc
Date
2005-02-16 07:10:00 6.8
2005-02-16 07:15:00 18.3
2005-02-16 07:20:00 12.7
2005-02-16 07:25:00 NaN
2005-02-16 07:30:00 NaN
2005-02-16 07:35:00 NaN
2005-02-16 07:40:00 NaN
...
...
代码:
import pandas as pd
import io
temp=u"""16-feb-05 07:10,6.8,8.1,0.0214,1.8105,0.0214,1.6985,1.00,2.631
16-feb-05 07:15,18.3,8.1,0.0214,1.8093,0.0214,1.6977,1.00,2.656
16-feb-05 07:20,12.7,8.1,0.0214,1.8083,0.0214,1.6971,1.00,2.673
01-mar-05 00:00,-10.1,7.9,0.0214,1.3718,0.0214,1.6761,1.00,29.419
01-mar-05 00:05,5.1,7.9,0.0214,1.3722,0.0214,1.6767,1.00,29.425
01-mar-05 00:10,-3.4,7.9,0.0214,1.3728,0.0214,1.6774,1.00,29.421
02-dec-06 13:00,-2.8,7.5,0.0214,1.0499,0.0214,1.5777,1.00,46.429
02-dec-06 14:00,3.4,7.5,0.0214,1.0488,0.0214,1.5767,1.00,46.482"""
names=['Date','Conc','Flow','SZ','SB','RZ','RB','Fraction','Attenuation']
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), index_col=0, names=names, parse_dates=True)
print (df)
Conc Flow SZ SB RZ RB Fraction \
Date
2005-02-16 07:10:00 6.8 8.1 0.0214 1.8105 0.0214 1.6985 1.0
2005-02-16 07:15:00 18.3 8.1 0.0214 1.8093 0.0214 1.6977 1.0
2005-02-16 07:20:00 12.7 8.1 0.0214 1.8083 0.0214 1.6971 1.0
2005-03-01 00:00:00 -10.1 7.9 0.0214 1.3718 0.0214 1.6761 1.0
2005-03-01 00:05:00 5.1 7.9 0.0214 1.3722 0.0214 1.6767 1.0
2005-03-01 00:10:00 -3.4 7.9 0.0214 1.3728 0.0214 1.6774 1.0
2006-12-02 13:00:00 -2.8 7.5 0.0214 1.0499 0.0214 1.5777 1.0
2006-12-02 14:00:00 3.4 7.5 0.0214 1.0488 0.0214 1.5767 1.0
Attenuation
Date
2005-02-16 07:10:00 2.631
2005-02-16 07:15:00 2.656
2005-02-16 07:20:00 2.673
2005-03-01 00:00:00 29.419
2005-03-01 00:05:00 29.425
2005-03-01 00:10:00 29.421
2006-12-02 13:00:00 46.429
2006-12-02 14:00:00 46.482
df1 = df[['Conc']].resample('5min').sum()
print (df1)
Conc
Date
2005-02-16 07:10:00 6.8
2005-02-16 07:15:00 18.3
2005-02-16 07:20:00 12.7
2005-02-16 07:25:00 NaN
2005-02-16 07:30:00 NaN
2005-02-16 07:35:00 NaN
2005-02-16 07:40:00 NaN
2005-02-16 07:45:00 NaN
2005-02-16 07:50:00 NaN
2005-02-16 07:55:00 NaN
2005-02-16 08:00:00 NaN
...
如果需要输出为serie:
df1 = df.resample('5min')['Conc'].sum()
print (df1)
Date
2005-02-16 07:10:00 6.8
2005-02-16 07:15:00 18.3
2005-02-16 07:20:00 12.7
2005-02-16 07:25:00 NaN
2005-02-16 07:30:00 NaN
2005-02-16 07:35:00 NaN
...
...
2006-12-02 13:45:00 NaN
2006-12-02 13:50:00 NaN
2006-12-02 13:55:00 NaN
2006-12-02 14:00:00 3.4
Freq: 5T, Name: Conc, dtype: float64