我正在合并两个df,并且想在匹配后访问列的前n行。
events_df['event']
和prices_df['date']
之间匹配的地方
以及之间的匹配项
events_df['ticker']
和prices_df['tic']
我想保留prices_df['price']
events_df
event ticker
0 01-01-2019 MSFT
1 12-12-2018 MSFT
2 12-11-2018 MSFT
3 02-03-2019 AAPL
4 12-12-2018 AAPL
5 12-11-2018 AAPL
6 01-01-2019 AAPL
prices_df
date tic price
0 01-01-2019 MSFT 1.0
1 02-01-2019 MSFT 1.1
2 03-01-2019 MSFT 1.2
3 04-01-2019 MSFT 1.3
4 05-01-2019 MSFT 1.4
5 01-01-2019 AAPL 2.0
6 02-01-2019 AAPL 2.1
7 03-01-2019 AAPL 2.2
8 04-01-2019 AAPL 2.3
9 05-01-2019 AAPL 2.4
我已经尝试合并
merged = events_df.merge(prices_df,left_on=['ticker','event'],right_on=['tic','date'])
n = 4的预期输出(来自匹配的events_df['events']
索引0,6)
date ticker price
0 01-01-2019 MSFT 1.0
1 02-01-2019 MSFT 1.1
2 03-01-2019 MSFT 1.2
3 04-01-2019 MSFT 1.3
4 01-01-2019 AAPL 2.0
5 02-01-2019 AAPL 2.1
6 03-01-2019 AAPL 2.2
7 04-01-2019 AAPL 2.3
答案 0 :(得分:0)
使用:
#changed sample data for more general
print (prices_df)
date tic price
0 01-01-2018 MSFT 1.0
1 01-01-2019 MSFT 1.0
2 02-01-2019 MSFT 1.1
3 03-01-2019 MSFT 1.2
4 04-01-2019 MSFT 1.3
5 05-01-2019 MSFT 1.4
6 01-01-2019 AAPL 2.0
7 02-01-2019 AAPL 2.1
8 03-01-2019 AAPL 2.2
9 04-01-2019 AAPL 2.3
10 05-01-2019 AAPL 2.4
#n to down, k to up
n = 2
k = 1
#get index by reset_index for avoid lost it
idx = events_df.merge(prices_df.rename_axis('idx').reset_index(),
left_on=['ticker','event'],
right_on=['tic','date'])['idx']
print (idx)
0 1
1 6
Name: idx, dtype: int64
#create groups by matching with original index, [::-1] for change ordering
s1 = prices_df.index.isin(idx).cumsum()
s2 = prices_df.index.isin(idx)[::-1].cumsum()
#repalce first and last groups to NaNs
up = np.where(s1 != 0, s1, np.nan)
lo = np.where(s2[::-1] != 0, s2[::-1] , np.nan)
#get couters compare by le (<=) and remove NaNs groups (first, last)
prices_df['um'] = prices_df.groupby(up).cumcount().le(n) & ~np.isnan(up)
prices_df['lm'] = prices_df.groupby(lo).cumcount(ascending=False).le(k) & ~np.isnan(lo)
print (prices_df)
date tic price um lm
0 01-01-2018 MSFT 1.0 False True
1 01-01-2019 MSFT 1.0 True True
2 02-01-2019 MSFT 1.1 True False
3 03-01-2019 MSFT 1.2 True False
4 04-01-2019 MSFT 1.3 False False
5 05-01-2019 MSFT 1.4 False True
6 01-01-2019 AAPL 2.0 True True
7 02-01-2019 AAPL 2.1 True False
8 03-01-2019 AAPL 2.2 True False
9 04-01-2019 AAPL 2.3 False False
10 05-01-2019 AAPL 2.4 False False
#filter by boolean indexing
mask = prices_df['um'] | prices_df['lm']
prices_df = prices_df[mask]
print (prices_df)
date tic price um lm
0 01-01-2018 MSFT 1.0 False True
1 01-01-2019 MSFT 1.0 True True
2 02-01-2019 MSFT 1.1 True False
3 03-01-2019 MSFT 1.2 True False
5 05-01-2019 MSFT 1.4 False True
6 01-01-2019 AAPL 2.0 True True
7 02-01-2019 AAPL 2.1 True False
8 03-01-2019 AAPL 2.2 True False
答案 1 :(得分:0)
您的合并看起来还不错。您只需要从中提取所需的列,因为合并后立即包含了两个DataFrame中的所有列。所以:
merged = events_df.merge(prices_df, left_on=['ticker', 'event'], right_on=['tic', 'date'])
merged = merged['date', 'picker', 'price']
然后,您必须对其进行过滤,以使价格低于3(如果需要,则为n
):
n = 3
merged = merged[merged['price'] < n]