我有2个熊猫数据框,我想合并为1个新数据框。我有一个元组列表,其中每个元组的第一个元素是第一个数据帧中行的索引,第二个元素是第二个数据帧中行的索引。
这里是一个例子:
### input sample
# table A
col_a col_b
0 1 2
1 4 5
2 7 8
3 1 1
# table B
col_c col_d
0 3 3
1 9 8
2 7 3
3 2 1
list_of_couples = [(0,1),(3,0)] # (index from table A, index from table B)
### expected output
col_a col_b col_c col_d
0 1 2 9 8
1 1 1 3 3
我尝试遍历元组列表并将合并的行一个接一个地添加到新的df中,但这会花费很多时间。
如何有效地做到这一点?谢谢!
答案 0 :(得分:0)
您可以从元组列表中创建一个数据框,然后合并两次。例如
# Create df from list of tuples
tuple_df = pd.DataFrame(list_of_couples, columns=['a', 'b'])
# Merge table_a with tuples
merged = pd.merge(table_a, tuple_df, left_index=True, right_on='a')
# Merge result with table_b
merged = pd.merge(merged, table_b, right_index=True, left_on='b')
# Removing intermediate join columns
merged = merged.drop(['a','b'], axis=1)
>>> print(merged)
col_a col_b col_c col_d
0 1 2 9 8
1 1 1 3 3
答案 1 :(得分:0)
我会尝试创建一个临时密钥以加入:
#unzip list_of_couples into index for table_a and table_b
a, b = zip(*list_of_couples)
#Loop on length of index to assign same value of key to each table for the appropriate index
for i in range(len(a)):
df_a.loc[a[i], 'key'] = i
df_b.loc[b[i], 'key'] = i
#merge dataframes on 'key', remove NaN records and drop temporary 'key' column
df_a.merge(df_b, on='key').dropna(subset=['key']).drop('key', axis=1)
输出:
col_a col_b col_c col_d
0 1 2 9 8
5 1 1 3 3