Question

我试图弄清楚是否可以加入/合并/连接两个表而不是＆＃39;外部＆＃39;我想从第二个表中选择带有pandas内置选项的独特ID。

现在我正在做一些事情我觉得我的代码不是很优雅：

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['c', '8', '1']]
b = [['a', '52', '49'], ['b', '23', '0.05'], ['x', '5', '0']]
df1 = pd.DataFrame(a, columns=['id_col', 'two', 'three'])
df2 = pd.DataFrame(b, columns=['id_col', 'two', 'three'])


# remove df2 entries also in df1
different_ids = set(df2.id_col).difference(set(df1.id_col))
df2 = df2[df2.id_col.isin(different_ids)]
# merge data frames
df_merged = pd.concat([df1,df2])

合并的df应该有来自df1的条目a，b，c和来自df2的x。

Answer 1

我认为您可以通过将df2与id_col一起对df1.id_col进行子集isin，然后将df1与res = pd.concat([df1, df2[~df2.id_col.isin(df1.id_col)]]) In [186]: res Out[186]: id_col two three 0 a 1.2 4.2 1 b 70 0.03 2 c 8 1 2 x 5 0进行对，并生成数据帧来完成所有这些操作：

In [23]: %timeit pd.concat((df1, df2)).drop_duplicates('id_col')
100 loops, best of 3: 1.95 ms per loop

In [24]: %timeit pd.concat([df1, df2[~df2.id_col.isin(df1.id_col)]])
100 loops, best of 3: 1.79 ms per loop

<强>定时：

@arr = qw(1 2 3 4);

print $arr[0],"\n";
print @arr[0],"\n";

从时间比较来看，这更快..

Answer 2

您可以在concat栏df1 df2和drop_duplicates以及id_col。

>>> df = pd.concat((df1, df2))
>>> print(df.drop_duplicates('id_col'))
  id_col  two three
0      a  1.2   4.2
1      b   70  0.03
2      c    8     1
2      x    5     0

Pandas合并表：只有第二个表中的不同ID

2 个答案: