我有两个DataFrame
continue
如何在df_b中仅选择共享索引元素:
df_a = pd.DataFrame(data=[['A', 'B', 'C'], ['A1', 'B1', 'C1']], columns=['first', 'secound', 'third'])
df_a.set_index(['first', 'secound'], inplace=True)
df_b = pd.DataFrame(data=[['A', 'B', 12], ['A', 'B', 143], ['C1', 'C1', 11]], columns=['first', 'secound', 'data'])
df_b.set_index(['first', 'secound'], inplace=True)
third
first secound
A B C
A1 B1 C1
data
first secound
A B 12
B 143
C1 C1 11
感谢您的帮助
答案 0 :(得分:5)
您可以采用索引的交集,并将其用作df_b.loc
的索引器:
In [28]: df_b.loc[df_b.index.intersection(df_a.index)]
Out[28]:
data
first secound
A B 12
B 143
或者,使用isin
为df_b.loc
生成布尔掩码:
In [32]: df_b.loc[df_b.index.isin(df_a.index)]
Out[32]:
data
first secound
A B 12
B 143
使用isin
似乎是最快的选择:
这是用于生成上述perfplot的设置:
import numpy as np
import pandas as pd
import perfplot
def isin(x):
df_a, df_b = x
return df_b.loc[df_b.index.isin(df_a.index)]
def intersection(x):
df_a, df_b = x
return df_b.loc[df_b.index.intersection(df_a.index)]
def join(x):
df_a, df_b = x
return df_a.drop(df_a.columns, axis=1).join(df_b).dropna()
def make_df(n):
df = pd.DataFrame(np.random.randint(10, size=(n, 3)))
df = df.set_index([0, 1])
return df
perfplot.show(
setup=lambda n: [make_df(n) for i in range(2)],
kernels=[isin, intersection, join],
n_range=[2**k for k in range(2, 15)],
logx=True,
logy=True,
equality_check=False, # rows may appear in different order
xlabel='len(df)')
答案 1 :(得分:2)
您可以将df_a
的索引加入df_b
,然后删除NaN
s:
>>> df_a.drop(df_a.columns, axis=1).join(df_b).dropna()
data
first secound
A B 12.0
B 143.0