Question

我有两个数据帧（f1_df和f2_df）：

f1_df看起来像：

ID,Name,Gender
1,Smith,M
2,John,M

f2_df看起来像：

name,gender,city,id

问题：

我希望代码将f1_df的标头与f2_df进行比较，并使用panda复制匹配列的数据。

输出：

输出应该是这样的：

name,gender,city,id  # name,gender,and id are the only matching columns btw   f1_df and f2_df 
Smith,M, ,1          # the data copied for name, gender, and id columns 
John,M, ,2

我是Pandas的新手，不知道如何处理这个问题。我试图对匹配的列进行内连接，但这不起作用。

这是我到目前为止所做的：

import pandas as pd

f1_df = pd.read_csv("file1.csv")
f2_df = pd.read_csv("file2.csv")

for i in f1_df:
    for j in f2_df:
        i = i.lower()
        if i == j:
            joined = f1_df.join(f2_df)
print joined

知道如何解决这个问题吗？

Answer 1

如果要在公共列上合并/加入DF，请尝试此操作：

首先将所有列转换为小写：

df1.columns = df1.columns.str.lower()
df2.columns = df2.columns.str.lower()

现在我们可以加入公共列

common_cols = df2.columns.intersection(df1.columns).tolist()
joined = df1.set_index(common_cols).join(df2.set_index(common_cols)).reset_index()

输出：

In [259]: joined
Out[259]:
   id   name gender city
0   1  Smith      M  NaN
1   2   John      M  NaN

导出为CSV：

In [262]: joined.to_csv('c:/temp/joined.csv', index=False)

C：/temp/joined.csv：

id,name,gender,city
1,Smith,M,
2,John,M,

如何使用Pandas复制CSV文件之间的匹配列？

1 个答案: