我有两个pandas数据框X和Y,每个都包含过去一个月的日内价格和时间数据。我希望在Y上连接Y,即我们在每次看到X的更新时都采用Y的现行价格。我想进行日内分析(因为隔夜效应)
我现在的代码是
Y_asof = Y.groupby('Date').apply(lambda x: x.asof(X.index))
但是,这会返回错误,说明
AttributeError: 'DataFrame' object has no attribute 'asof'
当我跑步时它正在工作
Y_asof = Y.apply(lambda x: x.asof(X.index))
X的示例数据:
Mid Date
Time
2015-09-15 13:02:03.000049 7.575392 2015-09-15
2015-09-15 13:02:06.000049 7.575521 2015-09-15
2015-09-15 13:02:08.000049 7.575392 2015-09-15
2015-09-15 13:02:14.000049 7.575521 2015-09-15
2015-09-15 13:02:15.000048 7.575649 2015-09-15
Y的样本数据:
Mid Date
Time
2015-09-15 12:00:00.443000 4.650894 2015-09-15
2015-09-15 12:00:00.443000 4.650899 2015-09-15
2015-09-15 12:00:06.321000 4.650894 2015-09-15
2015-09-15 12:00:06.322000 4.650884 2015-09-15
2015-09-15 12:00:10.839000 4.650894 2015-09-15
有人可以帮忙吗?非常感谢!
答案 0 :(得分:2)
asof是一个Series方法,而不是DataFrame方法。它适用于时间列:
In [11]: Y.groupby('Date').apply(lambda x: x["Time"].asof(X.index))
Out[11]:
Time 0 1 2 3 4
Date
2015-09-15 2015-09-15 12:00:00.443000 2015-09-15 12:00:00.443000 2015-09-15 12:00:06.321000 2015-09-15 12:00:06.322000 2015-09-15 12:00:10.839000
当您执行申请时,它跨越每一行(这是一个系列)。
答案 1 :(得分:0)
我相信pandas会抛出错误,因为Y.groupby('Date')
会创建一个没有方法GroupBy
的{{1}}对象。如果您只是使用asof
作为按日期排序的方式,则可以改为groupby
。
答案 2 :(得分:0)
pandas 0.19 has an asof join。由于您希望每个In [1]: import datetime
In [2]: from operator import itemgetter
In [3]: from itertools import groupby, combinations
In [4]: l = [
...: (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)),
...: (19L, datetime.datetime(2015, 2, 12, 16, 28, 48)),
...: (19L, datetime.datetime(2014, 9, 17, 11, 58, 19)),
...: (80L, datetime.datetime(2014, 9, 15, 12, 54, 36)),
...: (80L, datetime.datetime(2014, 9, 15, 14, 16, 39)),
...: (80L, datetime.datetime(2014, 2, 6, 8, 58, 39)),
...: (80L, datetime.datetime(2014, 9, 8, 14, 21, 48)),
...: (90L, datetime.datetime(2016, 8, 2, 18, 14, 31)),
...: (90L, datetime.datetime(2016, 8, 2, 21, 14, 23)),
...: (90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ]
In [5]: for user_id, dates in groupby(l, itemgetter(0)):
...: dates = [date[1] for date in dates]
...: differences = [abs((d1 - d2).days) for d1, d2 in zip(dates[0::2], dates[1::2])]
...: print(user_id, sum(differences) / len(differences))
...:
(19L, 2)
(80L, 108)
(90L, 1)
的最新Y
:
X