Question

我一直在努力寻找解决方案。阅读至少十几篇有关此主题的帖子，但似乎没有任何效果。

我需要通过ID合并两个csv文件。两个文件都有两个具有相同名称的列：组织ID和组织名称。以下是我的代码：

第一个文件

for elem in soup(text=re.compile(ticker)):
    print (elem.parent.get("id"))

第二档

name_cols = ['GUID1', 'GUID2', 'Org ID', 'Org Name', 'Org Type', 'Chapter', 'Join Date', 'Effective Date', 'Expire Date']
pull_cols = ['Org ID', 'Org Name', 'Org Type', 'Chapter', 'Join Date', 'Effective Date','Expire Date']

df1 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col='Org ID')

我仍然在探索大熊猫，并且我应该如何处理这个问题的任何线索。

Answer 1

在聊天讨论后，主要问题是＆＃34; Org Id＆＃34;被读作索引。添加参数＆＃34; index_col = False＆＃34;做了诀窍：

df2 = pd.read_csv(path, header=None, encoding="ISO-8859-1", names=name_cols, usecols=pull_cols, index_col='Org ID')

剩下的就是做一个＆＃34;内部＆＃34;加入：

pd.merge(df1, df2, how='inner', on=['Org ID', 'Org Name'])

通过ID将两个CSV文件合并为一个带有pandas的文件

1 个答案: