在新的DataFrame中使用Pandas GroupBy列

时间:2015-11-09 21:15:21

标签: python pandas

我有一个很大的温度时间序列,我正在执行某些功能。我每小时观察一次并创建每日统计数据。在完成计算之后,我想使用Groupby中的对象分组年份和Julian天(下面是' aa'以及出现的drangeT和drangeHI数组)带有这些变量的全新DataFrame。代码如下:

import numpy as np
import scipy.stats as st
import pandas as pd

city = ['BUF']#,'PIT','CIN','CHI','STL','MSP','DET']
mons = np.arange(5,11,1)

for a in city:
    data = 'H:/Classwork/GEOG612/Project/'+a+'Data_cut.txt'

    df = pd.read_table(data,sep='\t')
    df['TempF'] = ((9./5.)*df['TempC'])+32.
    df1 = df.loc[df['Month'].isin(mons)]
    aa = df1.groupby(['Year','Julian'],as_index=False)
    maxT = aa.aggregate({'TempF':np.max})
    minT = aa.aggregate({'TempF':np.min})
    maxHI = aa.aggregate({'HeatIndex':np.max})
    minHI = aa.aggregate({'HeatIndex':np.min})

    drangeT = maxT - minT
    drangeHI = maxHI - minHI

    df2 = pd.DataFrame(data = {'Year':aa.Year,'Day':aa.Julian,'TRange':drangeT,'HIRange':drangeHI})

df2命令中的所有变量的长度都是8250,但是当我运行它时会收到此错误消息:

 ValueError: cannot copy sequence with size 3 to array axis with dimension 8250 

欢迎并赞赏任何建议。谢谢!

0 个答案:

没有答案