Question

我有一个这样的数据框：

"16-feb-05 07:10",   6.8,  8.1, 0.0214, 1.8105, 0.0214, 1.6985, 1.00, 2.631
"16-feb-05 07:15",  18.3,  8.1, 0.0214, 1.8093, 0.0214, 1.6977, 1.00, 2.656
"16-feb-05 07:20",  12.7,  8.1, 0.0214, 1.8083, 0.0214, 1.6971, 1.00, 2.673
...
...

"01-mar-05 00:00", -10.1,  7.9, 0.0214, 1.3718, 0.0214, 1.6761, 1.00,29.419
"01-mar-05 00:05",   5.1,  7.9, 0.0214, 1.3722, 0.0214, 1.6767, 1.00,29.425
"01-mar-05 00:10",  -3.4,  7.9, 0.0214, 1.3728, 0.0214, 1.6774, 1.00,29.421

...然后数据每小时转一次

"02-dec-06 13:00",  -2.8,  7.5, 0.0214, 1.0499, 0.0214, 1.5777, 1.00,46.429
"02-dec-06 14:00",   3.4,  7.5, 0.0214, 1.0488, 0.0214, 1.5767, 1.00,46.482

我想平均每隔5分钟取第二列，但忽略其余部分。

我试过了：

names=['Date','Conc','Flow','SZ','SB','RZ','RB','Fraction','Attenuation']
px_all=pd.read_csv('Output1.csv',parse_dates=True,index_col=0,names=names)
close_px=px_all[['Conc']] #So was only concerned with the one column 

close_px.resample('5min',how='sum')

然后它说＆＃39; Conc＆＃39;不是指数。有人提出建议吗，提前谢谢！

Answer 1

对我来说它很好用：

#pandas 0.18.0
#df1 = close_px[['Conc']].resample('5min').sum()
#pandas bellow 0.18.0
df1 = close_px[['Conc']].resample('5min',how='sum')
print (df1)
                     Conc
Date                     
2005-02-16 07:10:00   6.8
2005-02-16 07:15:00  18.3
2005-02-16 07:20:00  12.7
2005-02-16 07:25:00   NaN
2005-02-16 07:30:00   NaN
2005-02-16 07:35:00   NaN
2005-02-16 07:40:00   NaN
...
...

代码：

import pandas as pd
import io

temp=u"""16-feb-05 07:10,6.8,8.1,0.0214,1.8105,0.0214,1.6985,1.00,2.631
16-feb-05 07:15,18.3,8.1,0.0214,1.8093,0.0214,1.6977,1.00,2.656
16-feb-05 07:20,12.7,8.1,0.0214,1.8083,0.0214,1.6971,1.00,2.673
01-mar-05 00:00,-10.1,7.9,0.0214,1.3718,0.0214,1.6761,1.00,29.419
01-mar-05 00:05,5.1,7.9,0.0214,1.3722,0.0214,1.6767,1.00,29.425
01-mar-05 00:10,-3.4,7.9,0.0214,1.3728,0.0214,1.6774,1.00,29.421
02-dec-06 13:00,-2.8,7.5,0.0214,1.0499,0.0214,1.5777,1.00,46.429
02-dec-06 14:00,3.4,7.5,0.0214,1.0488,0.0214,1.5767,1.00,46.482"""

names=['Date','Conc','Flow','SZ','SB','RZ','RB','Fraction','Attenuation']
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), index_col=0, names=names, parse_dates=True)
print (df)
                     Conc  Flow      SZ      SB      RZ      RB  Fraction  \
Date                                                                        
2005-02-16 07:10:00   6.8   8.1  0.0214  1.8105  0.0214  1.6985       1.0   
2005-02-16 07:15:00  18.3   8.1  0.0214  1.8093  0.0214  1.6977       1.0   
2005-02-16 07:20:00  12.7   8.1  0.0214  1.8083  0.0214  1.6971       1.0   
2005-03-01 00:00:00 -10.1   7.9  0.0214  1.3718  0.0214  1.6761       1.0   
2005-03-01 00:05:00   5.1   7.9  0.0214  1.3722  0.0214  1.6767       1.0   
2005-03-01 00:10:00  -3.4   7.9  0.0214  1.3728  0.0214  1.6774       1.0   
2006-12-02 13:00:00  -2.8   7.5  0.0214  1.0499  0.0214  1.5777       1.0   
2006-12-02 14:00:00   3.4   7.5  0.0214  1.0488  0.0214  1.5767       1.0   

                     Attenuation  
Date                              
2005-02-16 07:10:00        2.631  
2005-02-16 07:15:00        2.656  
2005-02-16 07:20:00        2.673  
2005-03-01 00:00:00       29.419  
2005-03-01 00:05:00       29.425  
2005-03-01 00:10:00       29.421  
2006-12-02 13:00:00       46.429  
2006-12-02 14:00:00       46.482  

df1 = df[['Conc']].resample('5min').sum()
print (df1)
                     Conc
Date                     
2005-02-16 07:10:00   6.8
2005-02-16 07:15:00  18.3
2005-02-16 07:20:00  12.7
2005-02-16 07:25:00   NaN
2005-02-16 07:30:00   NaN
2005-02-16 07:35:00   NaN
2005-02-16 07:40:00   NaN
2005-02-16 07:45:00   NaN
2005-02-16 07:50:00   NaN
2005-02-16 07:55:00   NaN
2005-02-16 08:00:00   NaN
...

如果需要输出为serie：

df1 = df.resample('5min')['Conc'].sum()
print (df1)
Date
2005-02-16 07:10:00     6.8
2005-02-16 07:15:00    18.3
2005-02-16 07:20:00    12.7
2005-02-16 07:25:00     NaN
2005-02-16 07:30:00     NaN
2005-02-16 07:35:00     NaN
...
...
2006-12-02 13:45:00     NaN
2006-12-02 13:50:00     NaN
2006-12-02 13:55:00     NaN
2006-12-02 14:00:00     3.4
Freq: 5T, Name: Conc, dtype: float64

平均每五分钟数据作为pandas数据帧中的一个数据点超过多天

1 个答案: