我将来自多个传感器的数据存储在CSV文件中。
时间分辨率是一分钟。
可轻松计算每个传感器的每日平均值。
我需要计算数据的每小时平均值。
CSV数据集格式:
5/27/2016 0:00:00 Temperature 39 25
5/27/2016 0:00:00 Pressure 12 39
5/27/2016 0:00:00 Humidity 19 79
.
.
.
5/27/2016 0:01:00 Temperature 39 25
我的代码的相关部分:
import pandas as pd
import numpy as np
df = pd.read_csv('2016-05-27.csv')
Count_Row=df.shape[0] #gives number of row count
print("The number of rows is", Count_Row)
Parameter_code = 11201
timestamp= []
Par_value = []
n = 0
for rown in range(0,Count_Row):
if df.iloc[rown,3] == Parameter_code:
temp = df.iloc[rown, 4]
if temp != 'nan':
Par_value.append(float (temp))
timestamp.append(df.iloc[rown, 1])
n += 1
print("total sapmles", n)
print (timestamp)
daily_avg = np.average(Par_value)
print("The daily average ", daily_avg)
问题:这种每小时平均值有什么方法吗?
答案 0 :(得分:2)
我认为你需要在熊猫中避免循环,因为慢和使用:
name
,将日期时间parse_date
添加到read_csv
boolean indexing
resample
和汇总mean
,NaN
默认省略import pandas as pd
temp=u"""
5/27/2016,0:00:00,Temperature,39,25
5/27/2016,0:00:00,Pressure,12,39
5/27/2016,0:00:00,Temperature,39,NaN
5/27/2016,0:01:00,Temperature,39,25"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
names = ['date','time','name','parameter code','value']
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[['date','time']], names=names)
print (df)
date_time name parameter code value
0 2016-05-27 00:00:00 Temperature 39 25.0
1 2016-05-27 00:00:00 Pressure 12 39.0
2 2016-05-27 00:00:00 Temperature 39 NaN
3 2016-05-27 00:01:00 Temperature 39 25.0
df = df[df['parameter code'] == 39]
print (df)
date_time name parameter code value
0 2016-05-27 00:00:00 Temperature 39 25.0
2 2016-05-27 00:00:00 Temperature 39 NaN
3 2016-05-27 00:01:00 Temperature 39 25.0
df1 = df.resample('H', on='date_time')['value'].mean().reset_index(name='mean_val')
print (df1)
date_time mean_val
0 2016-05-27 25.0