示例风数据集:
`.................RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL
DATE
1961-01-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
1961-01-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
1961-01-06 13.21 8.12 9.96 6.67 5.37 4.50 10.67 4.42 7.17 7.50 8.12 13.17
1961-02-07 13.50 14.29 9.50 4.96 12.29 8.33 9.17 9.29 7.58 7.96 13.96 13.79
1961-02-08 10.96 9.75 7.62 5.91 9.62 7.29 14.29 7.62 9.25 10.46 16.62 16.46
1961-03-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
1962-03-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
1962-06-06 13.21 8.12 9.96 6.67 5.37 4.50 10.67 4.42 7.17 7.50 8.12 13.17
1968-07-07 13.50 14.29 9.50 4.96 12.29 8.33 9.17 9.29 7.58 7.96 13.96 13.79
1968-07-08 10.96 9.75 7.62 5.91 9.62 7.29 14.29 7.62 9.25 10.46 16.62 16.46
1976-08-04 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88 5.46 10.88
1976-08-05 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34 12.92 11.83
1978-09-06 13.21 8.12 9.96 6.67 5.37 4.50 10.67 4.42 7.17 7.50 8.12 13.17
1978-09-07 13.50 14.29 9.50 4.96 12.29 8.33 9.17 9.29 7.58 7.96 13.96 13.79
1978-12-08 10.96 9.75 7.62 5.91 9.62 7.29 14.29 7.62 9.25 10.46 16.62 16.46`
完整数据集为here。
在此数据集中,列是位置,值是风速。我想计算数据集中每个月的风速。但是我想将1961年1月和1962年1月视为不同的月份。 我试着用for循环来做。首先,我创建了一个列名“ Month”,然后使用如下所示的for循环分配值:
`for i in range(len(data.index)):
if data.index[i].month == 1:
if data.index[i].year == 1961:
data['Month'][i] = 'January 61'
elif data.index[i].year == 1962:
data['Month'][i] = 'January 62'
else:
data['Month'][i] = 'January'
elif data.index[i].month == 2:
data['Month'][i] = 'February'
elif data.index[i].month == 3:
data['Month'][i] = 'March'
elif data.index[i].month == 4:
data['Month'][i] = 'April'
elif data.index[i].month == 5:
data['Month'][i] = 'May'
elif data.index[i].month == 6:
data['Month'][i] = 'June'
elif data.index[i].month == 7:
data['Month'][i] = 'July'
elif data.index[i].month == 8:
data['Month'][i] = 'August'
elif data.index[i].month == 9:
data['Month'][i] = 'September'
elif data.index[i].month == 10:
data['Month'][i] = 'October'
elif data.index[i].month == 11:
data['Month'][i] = 'November'
elif data.index[i].month == 12:
data['Month'][i] = 'December'`
然后我将在groupby
上使用data['Month']
,然后找到均值。但是它要花很长时间才能运行,而且我每次运行该程序时都不需要等待那么长时间。我还能如何解决这个问题?
注:实际数据集与示例数据集不太相同。我将列['Yr','Mo','Dy']合并为一个名为“ DATE”的列,然后将“ DATE”作为索引。而且我还使用NaN
删除了所有data.dropna(inplace=True)
值。
答案 0 :(得分:1)
尝试:
df.index = pd.to_datetime(df.index)
df.groupby([df.index.year, df.index.month]).mean()
RPT VAL ROS ... CLO BEL MAL
DATE DATE ...
1961 1 12.373333 9.333333 11.043333 ... 7.906667 8.833333 11.960
2 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
3 10.580000 6.630000 11.750000 ... 5.880000 5.460000 10.880
1962 3 13.330000 13.250000 11.420000 ... 10.340000 12.920000 11.830
6 13.210000 8.120000 9.960000 ... 7.500000 8.120000 13.170
1968 7 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
1976 8 11.955000 9.940000 11.585000 ... 8.110000 9.190000 11.355
1978 9 13.355000 11.205000 9.730000 ... 7.730000 11.040000 13.480
12 10.960000 9.750000 7.620000 ... 10.460000 16.620000 16.460
答案 1 :(得分:0)
我认为您尝试过的groupby
方法是可行的方法:
df.groupby(['year','month'])['RPT'].mean().reset_index()