平均DataFrame数组中的特定行数

时间:2015-07-29 07:16:20

标签: python arrays datetime pandas average

我正在尝试在名为ave_data的DataFrame中获取以下时间的平均值(如下所示):

  1. 5点到9点(4小时)
  2. 上午9点至下午6点(9小时)
  3. 下午6点至晚上10点(4小时)
  4. 晚上10点至凌晨5点(7小时)
  5. 目前我的'ave_data'DataFrame输出以下数据(请忽略所有破折号,它们仅用于对齐数据):

    时间---------------------- F1 ------------- F2 --------- --- F3
    2082-05-03 00:00:00 --- 43.005593 --- -56.509746 --- 25.271271
    2082-05-03 01:00:00 --- 55.114918 --- -59.173852 --- 31.849262
    2082-05-03 02:00:00 --- 63.990762 --- -64.699492 --- 52.426017
    2082-05-03 03:00:00 --- 56.508333 --- -65.489083 --- 36.188083
    2082-05-03 04:00:00 --- 36.217295 --- -59.198033 --- 2.404426
    2082-05-03 05:00:00 --- 36.153814 --- -62.156187 --- 24.779830
    2082-05-03 06:00:00 --- 93.920334 --- -57.923000 --- 77.654250 ...

    我想将这些新的平均值保存为类似的新数据框(下面的数字只是随机示例):

    时间-------------------- F1 ------------- F2 ----------- - F3
    早上(早上5点至上午9点)--- 50.005987 --- -60.509746 --- 29.311271
    日(上午9点至下午6点)-------- 59.005987 --- -49.509746 --- 98.311271
    晚上(下午6点至10点) - 55.018887 --- -47.614622 --- 29.311271
    晚上(晚上10点至上午5点)---- 55.018887 --- -47.614622 --- 29.311271

    另外,我最好添加一个包含每行平均值的最后一列。代码将用于读取不同的文件,这些文件将生成与下面显示的不同数量的列,因此,如果有人可以帮助我开发一个通用方法,那将是很好的。

    以下是我所拥有的代码的相关部分:

    raw_data = pd.read_excel(r'/Users/linnkloster/Desktop/Results/01_05_2012 Raw Results.xls', skiprows=1, header=0, nrows=1440, dayfirst=True, infer_datetime_format='%d/%m/%Y %H')
    raw_data[u'Time']= pd.to_datetime(raw_data['Time'], unit='d')
    # Converts first column to datetime, to make averaging easier
    # Note this gets the wrong date (2082-05-03) but correct hour
    raw_data.set_index(pd.DatetimeIndex(raw_data[u'Time']), inplace=True)
    ave_data=raw_data.resample('h', how='mean')
    print ave_data
    

2 个答案:

答案 0 :(得分:1)

您可以应用返回类别的函数:

import pandas as pd

data = [('2082-05-03 00:00:00', 43.005593, -56.509746, 25.271271),
('2082-05-03 01:00:00', 55.114918, -59.173852, 31.849262),
('2082-05-03 02:00:00', 63.990762, -64.699492, 52.426017),
('2082-05-03 03:00:00', 56.508333, -65.489083, 36.188083),
('2082-05-03 04:00:00', 36.217295, -59.198033, 2.404426),
('2082-05-03 05:00:00', 36.153814, -62.156187, 24.779830),
('2082-05-03 06:00:00', 93.920334, -57.923000, 77.654250)]

df = pd.DataFrame(data = data, columns=['Time', 'F1', 'F2', 'F3'])
df.Time = pd.to_datetime(df.Time)

def time_cat(t):
    hour = t.hour
    if hour < 5:
        return 'Night(10PM-5AM)'
    if hour < 9:
        return 'Morning(5AM-9AM)'
    if hour < 18:
        return 'Day(9AM-6PM)'
    if hour < 22:
        return 'Evening(6PM-10PM)'
    # if hour >= 22:
    return 'Night(10PM-5AM)'

df.groupby(df.Time.apply(time_cat)).mean()
                    F1          F2          F3
Time            
Morning(5AM-9AM)    65.037074   -60.039594  51.217040
Night(10PM-5AM)     50.967380   -61.014041  29.627812

答案 1 :(得分:0)

这个怎么样?请注意,我添加了三列,因此我使用了dfraw_data的副本。如果没问题,当然不需要副本。

def Time(hour):
    if(hour>=5 and hour<9):
        return 'Morning'
    elif(hour>=9 and hour<18):
        return 'Day'
    elif(hour>=18 and hour<22):
        return 'Evening'
    else:
        return 'Night'

df = raw_data.copy()
df['date'] = df.Time.apply(lambda time:time.date())
df['hour'] = df.Time.apply(lambda time:time.hour)
df['time']=df.hour.apply(Time)
ave_data=df.drop('hour',axis=1).groupby(['date','time']).mean()