Python:用CSV计算每小时的平均值?

时间:2016-10-26 07:18:33

标签: python csv pandas average hour

我想使用CSV文件计算每个小时的平均值:

以下是我的数据集:

Timestamp    Temperature
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1
9/1/2016 0:03:08    54.1
9/1/2016 0:03:38    54.1
9/1/2016 0:04:38    54
9/1/2016 0:05:38    54
9/1/2016 0:06:08    54
9/1/2016 0:06:38    54
9/1/2016 0:07:08    54
9/1/2016 0:07:38    54
9/1/2016 0:08:08    54.1
9/1/2016 0:08:38    54.1
9/1/2016 0:09:38    54.1
9/1/2016 0:10:32    54
9/1/2016 0:11:02    54
9/1/2016 0:11:32    54
9/1/2016 0:00:08    54
9/2/2016 0:00:20    32
9/2/2016 0:00:50    32
9/2/2016 0:01:20    32
9/2/2016 0:01:50    32
9/2/2016 0:02:20    32
9/2/2016 0:02:50    32
9/2/2016 0:03:20    32
9/2/2016 0:03:50    32
9/2/2016 0:04:20    32
9/2/2016 0:04:50    32
9/2/2016 0:05:20    32
9/2/2016 0:05:50    32
9/2/2016 0:06:20    32
9/2/2016 0:06:50    32
9/2/2016 0:07:20    32
9/2/2016 0:07:50    32

这是我计算每日平均值的代码,但我想要每小时:

from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')

df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})

print(df.groupby(level=0).mean())

我想要的输出如下:

Timestamp              Temp          *        Avg
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8    ?avg for this hour
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8    ?avg for this hour
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1

现在我想要特定小时的平均值,Min

期望的输出:

这里我只打印了5个小时的输出日期01-09-2016和02-09-16

010900              54.362727         45.497273
010901              54.723276         45.068103
010902              54.746847         45.370270
010903              54.833913         44.931304
010904              54.971053         44.835088
010905              55.519444         44.459259
020901              31.742553         55.640426
020902              31.495556         55.655556
020903              31.304348         55.442609
020904              31.200000         55.437273
020905              31.294382         55.442697

具体日期和具体时间? 我如何存档?

2 个答案:

答案 0 :(得分:0)

我认为您首先需要read_csv参数index_col=[0]用于读取第一列到indexparse_dates=[0]用于解析第一列到DatetimeIndex

df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
                     Temperature
Timestamp                       
2016-09-01 00:00:08         53.8
2016-09-01 00:00:38         53.8
2016-09-01 00:01:08         53.8
2016-09-01 00:01:38         53.8
2016-09-01 00:02:08         53.8
2016-09-01 00:02:38         54.1
2016-09-01 00:03:08         54.1
...
...

然后在hours之前使用resample并汇总Resampler.mean,但在NaN中获取DatetimeIndex个缺失数据:

print (df.resample('H').mean())
                     Temperature
Timestamp                       
2016-09-01 00:00:00    53.980952
2016-09-01 01:00:00          NaN
2016-09-01 02:00:00          NaN
2016-09-01 03:00:00          NaN
2016-09-01 04:00:00          NaN
2016-09-01 05:00:00          NaN
2016-09-01 06:00:00          NaN
2016-09-01 07:00:00          NaN
2016-09-01 08:00:00          NaN
2016-09-01 09:00:00          NaN
2016-09-01 10:00:00          NaN
2016-09-01 11:00:00          NaN
2016-09-01 12:00:00          NaN
2016-09-01 13:00:00          NaN
2016-09-01 14:00:00          NaN
2016-09-01 15:00:00          NaN
2016-09-01 16:00:00          NaN
2016-09-01 17:00:00          NaN
2016-09-01 18:00:00          NaN
2016-09-01 19:00:00          NaN
2016-09-01 20:00:00          NaN
2016-09-01 21:00:00          NaN
2016-09-01 22:00:00          NaN
2016-09-01 23:00:00          NaN
2016-09-02 00:00:00    32.000000

另一种解决方案是通过此minutes转换为secondshours来删除groupbyarray

print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00']

print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
            Temperature
2016-09-01    53.980952
2016-09-02    32.000000

此外,如果需要按月计算,则可以DatetimeIndex.strftime DatetimeIndex.hour groupby print (df.index.strftime('%m%d%H')) ['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'] print (df.groupby([df.index.strftime('%m%d%H')]).mean()) Temperature 090100 53.980952 090200 32.000000 生成日期和时间:

groupby

或者,如果需要仅按{{3}} print (df.index.hour) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] print (df.groupby([df.index.hour]).mean()) Temperature 0 44.475676 的小时<div ng-if="dataFeteched"><directive-abc ></directive-abc></div>表示:

  $route['default_controller'] = 'Login';
  $route['404_override'] = '';
  $route['about-us']="Foldername(if-any)/ControllerName/FunctionName";

答案 1 :(得分:0)

我首先要定义一个新列hour以提高可读性,然后groupBy

df = pd.DataFrame.from_csv('/home/kk/Desktop/cal_Avg.csv',index_col=None)
df['hour']=df['Timestamp'].apply(lambda s:s[:-3])
df[['hour','Temprature']].groupBy('hour').mean()