两周前我开始学习,现在我有点陷入困境。我有2个TimeSeries,看起来像这个:
2011-01-09 00:00:00+00:00 7.430126
2011-01-09 01:00:00+00:00 6.793855
2011-01-09 02:00:00+00:00 6.675949
2011-01-09 03:00:00+00:00 6.756636
2011-01-09 04:00:00+00:00 6.875174
2011-01-09 05:00:00+00:00 5.432611
2011-01-09 06:00:00+00:00 6.059197
2011-01-09 21:00:00+00:00 5.338928
2011-01-09 22:00:00+00:00 5.259672
2011-01-09 23:00:00+00:00 5.247196
2011-01-10 00:00:00+00:00 5.889274
2011-01-10 01:00:00+00:00 6.133871
2011-01-10 02:00:00+00:00 6.111958
2011-01-10 03:00:00+00:00 5.873732
2011-01-10 04:00:00+00:00 5.627684
2011-01-10 05:00:00+00:00 5.265644
2011-01-10 06:00:00+00:00 5.505559
2011-01-10 21:00:00+00:00 3.835050
2011-01-10 22:00:00+00:00 3.879653
2011-01-10 23:00:00+00:00 4.034543
2011-01-11 00:00:00+00:00 4.844272
2011-01-11 01:00:00+00:00 4.670967
2011-01-11 02:00:00+00:00 4.584164
2011-01-11 03:00:00+00:00 4.786821
这是风速测量的数据,我想将其与模型数据进行比较。更具体地说,我想比较晚上的风速(21.00 - 6.00)。所以我定义了一个函数:
def func(model, measure):
return (model-measure).mean()
此外,我在数据上创建了一个循环:
mean_night = []
start = 7
for a in night:
mean_night.append(func(model, measure[start:(start+10)]))
start = start+11
if start>5378:
break
问题是我丢失了我的时间索引并且丢失了一些数据(例如1天或1周),因此我很难用DateRange重新索引它。最后,它应该是这样的:
date difference_means
2011-01-09 diff_1
2011-01-09 diff_2
等等。我用pandas 0.7.1。感谢你的支持! (抱歉我的英语不好:P)
答案 0 :(得分:2)
pandas 0.8.1 对于每小时采样数据:
In [57]: import pandas
In [58]: import numpy
In [59]: index = pandas.date_range(start='2011-01-09', periods=240, freq='H')
In [60]: s = pandas.Series(np.random.randn(len(index)), index)
In [61]: s_night = s[(s.index.hour >= 21) | (s.index.hour <= 6)]
In [62]: def day_or_night(dates):
....: r = []
....: for date in dates:
....: if (date.hour >= 21) | (date.hour <= 6):
....: d = datetime.datetime(date.year, date.month, date.day)
....: if (date.hour <= 6):
....: d = d - pandas.offsets.Day()
....: r.append(d)
....: else:
....: r.append('day')
....: return r
....:
In [63]: s_night.groupby(day_or_night(s_night.index)).mean()
Out[63]:
2011-01-08 0.652095
2011-01-09 0.004129
2011-01-10 0.457892
2011-01-11 -0.078547
2011-01-12 0.008087
2011-01-13 0.043568
2011-01-14 0.505970
2011-01-15 0.150971
2011-01-16 0.107265
2011-01-17 0.117811
2011-01-18 -0.191193
答案 1 :(得分:0)
您应该升级到0.8.1并利用所有新的时间序列功能。 请查看http://pandas.pydata.org以获取文档。
在最新版本中,结帐功能如between_time
可在特定时间范围内过滤。
答案 2 :(得分:0)
我终于找到了一个有效的解决方案:
hr = dr.map(lambda x: x.hour)
meantime = lambda x: x.replace(hour=0)
datra = pd.DateRange('2011/1/1', '2011/12/31', offset=pd.datetools.day)
rise = pd.TimeSeries(np.cos(((datra.map(lambda x: (x-datetime(x.year,1,1)).total_seconds() / 86400) + 10) / 183. * np.pi)) * -2. + 17., index=datra)
set = pd.TimeSeries(np.cos(((datra.map(lambda x: (x-datetime(x.year,1,1)).total_seconds() / 86400) + 10) / 183. * np.pi)) * 2.5 + 5., index=datra)
i=0
def bias_night(liste, group):
while (i<546):
if (i<364):
z = group[dr[hr>unter11[i]]].combine_first(group[dr[hr<auf11[i+1]]]).groupby(meantime).mean()
liste.append(z[i])
else:
z = group[dr[hr>unter11[i-365]]].combine_first(group[dr[hr<auf11[i-365+1]]]).groupby(meantime).mean()
liste.append(z[i])
i = i+1
t = group[dr[hr>unter11[364]]].combine_first(group[dr[hr<auf11[0]]]).groupby(meantime).mean()
liste.insert(364, t[364])
liste是一个空列表,group是我的TimeSeries之一。最后,我只需要减去结果列表以获得我想要的内容。
2011-01-09 -1.179578
2011-01-10 -0.978171
2011-01-11 -0.335977
2011-01-12 0.080671
2011-01-13 -0.324661
2011-01-14 0.012359
2011-01-15 -0.549079