如何合并三个CSV文件中的各个列?

时间:2017-05-03 12:15:17

标签: python pandas

我有三个CSV文件:

档案1

id,code
1,a
2,b
3,c
4,d

文件2

no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y

文件3

record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556

如果countsum,我想将文件2 中的idno文件1 合并如果meanmedian匹配,则文件3 文件1 匹配并合并idrecord。我尝试了以下代码,但最终输出文件有很多完整字段,即使它们与id匹配。

df = pd.concat([file1, file2,file3], join_axes=[df.index])
df= df.drop["class"]

1 个答案:

答案 0 :(得分:1)

我认为您需要在read_csv的第一列设置索引:

import pandas as pd
from pandas.compat import StringIO

temp=u"""id,code
1,a
2,b
3,c
4,d"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file1 = pd.read_csv(StringIO(temp), index_col=[0])
print (file1)

temp=u"""
no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file2 = pd.read_csv(StringIO(temp), index_col=[0])
print (file2)

temp=u"""
record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file3 = pd.read_csv(StringIO(temp), index_col=[0])

print (file3)
df = pd.concat([file1, file2,file3], axis=1).drop("class", axis=1)
print (df)
  code  count      sum     mean  median
1    a  674.0  78256.0  67234.0   785.0
2    b    NaN      NaN  67233.0   664.0
3    c  567.0  55562.0   5437.0   553.0
4    d  344.0  56789.0  34423.0   556.0
5  NaN  673.0  66259.0      NaN     NaN

如果未在read_csv中设置索引,则需要添加set_index

import pandas as pd
from pandas.compat import StringIO

temp=u"""id,code
1,a
2,b
3,c
4,d"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file1 = pd.read_csv(StringIO(temp))
print (file1)

temp=u"""
no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file2 = pd.read_csv(StringIO(temp))
print (file2)

temp=u"""
record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file3 = pd.read_csv(StringIO(temp))

print (file3)
df=pd.concat([file1.set_index('id'), file2.set_index('no'),file3.set_index('record')],axis=1)
       .drop("class", axis=1)
print (df)
  code  count      sum     mean  median
1    a  674.0  78256.0  67234.0   785.0
2    b    NaN      NaN  67233.0   664.0
3    c  567.0  55562.0   5437.0   553.0
4    d  344.0  56789.0  34423.0   556.0
5  NaN  673.0  66259.0      NaN     NaN

或者对于内部联接,将join='inner'添加到concat

df = pd.concat([file1.set_index('id'),
                file2.set_index('no'),
                file3.set_index('record')], join='inner', axis=1).drop("class", axis=1)
print (df)
  code  count    sum   mean  median
3    c    567  55562   5437     553
1    a    674  78256  67234     785
4    d    344  56789  34423     556