Question

我需要组合2个pandas数据帧，其中df1.date在df2之前的2个月内。然后，我想计算在此期间有多少交易者交易了相同的股票并计算了所购买的股票总数。

我尝试过使用下面列出的方法，但发现它很复杂。我相信会有一个更聪明/更简单的解决方案。

Pandas: how to merge two dataframes on offset dates?

示例数据集如下：

DF1（team_1）：

update

DF2（team_2）：

date        shares  symbol  trader
31/12/2013  154     FDX     Max
30/06/2016  2367    GOOGL   Max
21/07/2015  293     ORCL    Max
18/07/2015  304     ORCL    Sam

必需的输出：

date        shares  symbol  trader
23/08/2015  345     ORCL    John
04/07/2014  567     FB      John
06/12/2013  221     ACER    Sally
31/11/2012  889     HP      John
05/06/2010  445     ABBV    Kate

这增加了2个新列...... 'team_2_traders'=来自team_1的交易商在DF2上列出的前两个月内交易相同股票的数量。 'team_2_shares_bought'= team1在DF2上列出的前2个月内购买的总股数。

如果有人愿意解决此问题，请使用下面的代码段来设置数据框。请记住，实际数据集包含数百万行和6,000个公司股票。

date        shares  symbol  trader  team_2_traders  team_2_shares_bought
23/08/2015  345     ORCL    John    2               597
04/07/2014  567     FB      John    0               0
06/12/2013  221     ACER    Sally   0               0
31/11/2012  889     HP      John    0               0
05/06/2010  445     ABBV    Kate    0               0

感谢帮助 - 谢谢。

Answer 1

请检查我的解决方案。

from pandas.tseries.offsets import MonthEnd

df_ = df2.merge(df1, on=['symbol'])
df_['date_x'] = pd.to_datetime(df_['date_x'])
df_['date_y'] = pd.to_datetime(df_['date_y'])

df_2m = df_[df_['date_x'] < df_['date_y'] + MonthEnd(2)] \
        .loc[:, ['date_y', 'shares_y', 'symbol', 'trader_y']] \
        .groupby('symbol')

df1_ = pd.concat([df_2m['shares_y'].sum(), df_2m['trader_y'].count()], axis=1)

print(df1_)

        shares_y  trader_y
symbol                    
ORCL         597         2

print(df2.merge(df1_.reset_index(), on='symbol', how='left').fillna(0))

         date  shares symbol trader  shares_y  trader_y
0  23/08/2015     345   ORCL   John     597.0       2.0
1  04/07/2014     567     FB   John       0.0       0.0
2  06/12/2013     221   ACER  Sally       0.0       0.0
3  30/11/2012     889     HP   John       0.0       0.0
4  05/06/2010     445   ABBV   Kate       0.0       0.0

合并两个pandas DataFrames，其中日期字段在两个月之内

1 个答案: