我有2个数据框
date_1 = datetime.datetime(2008, 7, 1)
date_2 = datetime.datetime(2008, 7, 6)
t_ndx_1 = pd.date_range(date_1, periods=5, freq='30D')
t_ndx_2 = pd.date_range(date_2, periods=5, freq='30D')
index_1 = pd.MultiIndex.from_tuples(ndx_tuple).set_names(['date', 'sid'])
df_1 = pd.DataFrame(np.random.randn(n*3), index=index_1, columns=['price'])
index_2 = pd.MultiIndex.from_tuples(ndx_tuple2).set_names(['date', 'sid'])
df_2 = pd.DataFrame(np.arange(n*3), index=index_2, columns=['factor'])
print(df_1)
price
date sid
2008-07-01 baz 0.952190
bar 0.151116
foo 1.016207
2008-07-31 baz 0.651457
bar -0.069647
foo -0.307071
2008-08-30 baz -0.135290
bar 1.782500
foo 0.178755
2008-09-29 baz 1.871211
bar 1.505863
foo -1.749282
2008-10-29 baz -0.369726
bar 1.754219
foo -0.206260
print(df_2)
factor
date sid
2008-07-06 baz 0
bar 1
foo 2
2008-08-05 baz 3
bar 4
foo 5
2008-09-04 baz 6
bar 7
foo 8
2008-10-04 baz 9
bar 10
foo 11
2008-11-03 baz 12
bar 13
foo 14
我正在尝试合并它们以获得模拟merge_asof(我有pandas 18.01)。 我找到了简单DataFrame的解决方案,没有多索引:http://genericfunct.blogspot.ru/2014/10/pandas-asof-join-sample.html
我的解决方案:
def concat_asof(row, df):
asof_date, sid = row.name
idx = pd.IndexSlice
df.sort_index(inplace=True)
row = row.append(df.loc[idx[:asof_date, sid], :].iloc[-1])
row.name = (asof_date, sid)
return row
df_2.apply(lambda x: concat_asof(x, df_1), axis=1)
出:
factor price
date sid
2008-07-06 bar 1.0 -0.201496
baz 0.0 -0.271080
foo 2.0 -0.638019
2008-08-05 bar 4.0 1.253838
baz 3.0 0.393520
foo 5.0 0.682805
2008-09-04 bar 7.0 0.706282
baz 6.0 1.431680
foo 8.0 -1.169740
2008-10-04 bar 10.0 0.638859
baz 9.0 0.330555
foo 11.0 -1.221649
2008-11-03 bar 13.0 -0.507731
baz 12.0 -0.221221
foo 14.0 0.021257
这项任务有一个很好的解决方案吗?谢谢你的帮助!