我有两个CSV文件,包含以下架构:
CSV1列:
"Id","First","Last","Email","Company"
CSV2列:
"PersonId","FirstName","LastName","Em","FavoriteFood"
如果我将它们分别加载到Pandas DataFrame中并执行newdf = df1.merge(df2, how='outer', left_on=['Last', 'First'], right_on=['LastName','FirstName'])
然后,已加入的DataFrame的CSV导出具有以下模式:
"Id","First","Last","Email","Company","PersonId","FirstName","LastName","Em","FavoriteFood"
我想要的是更像这样的输出模式:
"Id","First","Last","Email","Company","PersonId","Em","FavoriteFood"
我熟悉的大多数关系数据库软件都是(左侧连接列名称赢得了命名大战)。 Pandas是否有指示它这样做的语法?
我可以做df1.merge(df2.rename(columns = {'LastName':'Last', 'FirstName':'First'}), how='outer', on=['Last', 'First'])
,但在风格上,它让我疯狂地在我的源代码中对相同的列名进行两次硬编码。如果我更改CSV文件中的列名,还需要解决此问题。
答案 0 :(得分:0)
一种方法是以相同的方式合并,但删除您要删除的列。
newdf.drop(['LastName','FirstName'], 1, inplace=True)