从python pandas中的多个csv文件中选择特定列

时间:2018-06-06 11:15:51

标签: python pandas csv join

我正在尝试从多个小型csv文件创建修改后的CSV文件。 field1.csvfield2.csv中共有一列。最终的csv文件final.csv将包含来自column["NAME"]的{​​{1}},column["ACC"]和来自field1.csv column1["SCORE"]的{​​{1}} column["TEAM"] field2.csv来自column["ID"]的{​​{1}}来自field1.csv的{​​{1}}。如果没有值,那么它应该是空白的。我正在使用Python熊猫。

field1.csv: -

column["ID"]

field2.csv: -

field2.csv

final.csv: -

"ID","NAME","ACC","POINT"
"123","TRR","OOP","64"
"124","DEE","OOP","78"
"125","EWR","PLO","98"

我正在尝试的Python代码,

"ID","SCORE","TEAM","END"
"111","92","BCC","0"
"121","80","CSS","1"
"123","87","BCC","0"

1 个答案:

答案 0 :(得分:0)

我认为需要一个参数index_col才能将第一列转换为index,其中过滤列由usecols加上join默认为左连接:

df1 = pd.read_csv("field1.csv", index_col=[0], usecols=["ID","NAME","ACC"])

df2 = pd.read_csv("field2.csv", index_col=[0], usecols=["ID","SCORE","TEAM"])

finaldf = df1.join(df2)
print (finaldf)
    NAME  ACC  SCORE TEAM
ID                       
123  TRR  OOP   87.0  BCC
124  DEE  OOP    NaN  NaN
125  EWR  PLO    NaN  NaN

另一种可能的解决方案是通过子集在join之前过滤列:

df1 = pd.read_csv("field1.csv", index_col=[0])

df2 = pd.read_csv("field2.csv", index_col=[0])

finaldf = df1[["NAME","ACC"]].join(df2[["SCORE","TEAM"]]) 

上次写入文件时忽略index

finaldf.to_csv('final.csv', index=False)