Question

我正在尝试使用熊猫来计算每日气候。我的代码是：

import pandas as pd

dates      = pd.date_range('1950-01-01', '1953-12-31', freq='D')
rand_data  = [int(1000*random.random()) for i in xrange(len(dates))]
cum_data   = pd.Series(rand_data, index=dates)
cum_data.to_csv('test.csv', sep="\t")

cum_data是包含1950年1月1日至1953年12月31日的每日日期的数据框。我想创建一个长度为365的新向量，第一个元素包含1950年，1951年，1952年和1953年1月1日的平均rand_data等等第二个元素......

有关如何使用pandas进行此操作的任何建议吗？

Answer 1

您可以按年份分组，并计算这些组的平均值：

cum_data.groupby(cum_data.index.dayofyear).mean()

但是，你要注意闰年。这会导致这种方法出现问题。作为替代方案，您还可以按月和日分组：

In [13]: cum_data.groupby([cum_data.index.month, cum_data.index.day]).mean()
Out[13]:
1  1     462.25
   2     631.00
   3     615.50
   4     496.00
...
12  28    378.25
    29    427.75
    30    528.50
    31    678.50
Length: 366, dtype: float64

Answer 2

@joris。谢谢。您的回答正是我需要使用熊猫来计算每日气候，但您没有完成最后一步。将月，日索引重新映射回所有年份的一年中某一天的索引，包括闰年，即1到366.所以我想我会为其他用户分享我的解决方案。 1950年至1953年为4年，闰年为1952年。注意，因为每次运行使用随机值会产生不同的结果。

...   
from datetime import date
doy = []
doy_mean = []
doy_size = []
for name, group in cum_data.groupby([cum_data.index.month, cum_data.index.day]):
  (mo, dy) = name
  # Note: can use any leap year here.
  yrday = (date(1952, mo, dy)).timetuple().tm_yday
  doy.append(yrday)
  doy_mean.append(group.mean())
  doy_size.append(group.count())
  # Note: useful climatology stats are also available via group.describe() returned as dict
  #desc = group.describe()
  # desc["mean"], desc["min"], desc["max"], std,quartiles, etc.

# we lose the counts here.
new_cum_data  = pd.Series(doy_mean, index=doy)
print new_cum_data.ix[366]
>> 634.5

pd_dict = {}
pd_dict["mean"] = doy_mean
pd_dict["size"] = doy_size
cum_data_df = pd.DataFrame(data=pd_dict, index=doy)

print cum_data_df.ix[366]
>> mean    634.5
>> size      4.0
>> Name: 366, dtype: float64
# and just to check Feb 29
print cum_data_df.ix[60]
>> mean    343
>> size      1
>> Name: 60, dtype: float64

Answer 3

希望它可以提供任何帮助，我想发布我的解决方案，以获得与原始时间序列具有相同索引和长度的气候系列。

我使用joris＆＃39;获得模型气候学的解决方案＆＃34; 365/366元素，然后我构建我想要的系列，从我的原始时间序列中获取此模型气候学和时间索引的值。通过这种方式，闰年等事情会自动得到解决。

#I start with my time series named 'serData'.
#I apply joris' solution to it, getting a 'model climatology' of length 365 or 366.
serClimModel = serData.groupby([serData.index.month, serData.index.day]).mean()

#Now I build the climatology series, taking values from serClimModel depending on the index of serData.
serClimatology = serClimModel[zip(serData.index.month, serData.index.day)]

#Now serClimatology has a time index like this: [1,1] ... [12,31].
#So, as a final step, I take as time index the one of serData.
serClimatology.index = serData.index

使用pandas python计算每日气候学

3 个答案: