Question

我有2个熊猫数据框，我想合并为1个新数据框。我有一个元组列表，其中每个元组的第一个元素是第一个数据帧中行的索引，第二个元素是第二个数据帧中行的索引。

这里是一个例子：

### input sample

# table A
    col_a   col_b
0   1       2
1   4       5
2   7       8
3   1       1

# table B
    col_c   col_d
0   3       3
1   9       8
2   7       3
3   2       1

list_of_couples = [(0,1),(3,0)] # (index from table A, index from table B)

### expected output

    col_a   col_b   col_c   col_d
0   1       2       9       8
1   1       1       3       3

我尝试遍历元组列表并将合并的行一个接一个地添加到新的df中，但这会花费很多时间。
如何有效地做到这一点？谢谢！

Answer 1

您可以从元组列表中创建一个数据框，然后合并两次。例如

# Create df from list of tuples
tuple_df = pd.DataFrame(list_of_couples, columns=['a', 'b'])

# Merge table_a with tuples
merged = pd.merge(table_a, tuple_df, left_index=True, right_on='a')

# Merge result with table_b
merged = pd.merge(merged, table_b, right_index=True, left_on='b')

# Removing intermediate join columns
merged = merged.drop(['a','b'], axis=1)

>>> print(merged)

  col_a col_b col_c col_d
0     1     2     9     8
1     1     1     3     3

Answer 2

我会尝试创建一个临时密钥以加入：

#unzip list_of_couples into index for table_a and table_b
a, b  = zip(*list_of_couples)

#Loop on length of index to assign same value of key to each table for the appropriate index
for i in range(len(a)):
    df_a.loc[a[i], 'key'] = i
    df_b.loc[b[i], 'key'] = i

#merge dataframes on 'key', remove NaN records and drop temporary 'key' column
df_a.merge(df_b, on='key').dropna(subset=['key']).drop('key', axis=1)

输出：

   col_a  col_b  col_c  col_d
0      1      2      9      8
5      1      1      3      3

如何基于元组列表合并2个数据框，其中每个元组都具有每个数据框的相关键？

2 个答案: