根据pandas中的引用列加入两个.csv文件

时间:2016-05-17 11:42:51

标签: python csv dictionary pandas

我有2个不同大小的文件(customer_id在两个文件中的顺序不同):

enter image description here

data = pd.read_csv('data.csv')

id    name    country   town     customer_id
xxxx  Anna     UK       London   sahdghkl
yyyy  Maria    USA      Huston   avrnnfgs
cccc  Peter    FR       Paris    eesfawsd

data2 = pd.read_csv('data2.csv')

customer_id  card_id   bank   date
sahdghkl     5975845   aaaaa  20000101
avrnnfgs     1122255   bbbbb  20010101
eesfawsd     3366552   ccccc  20020101

我想得到输出:

result
id    name    country   town     customer_id  card_id   bank   date
xxxx  Anna     UK       London   sahdghkl     5975845   aaaaa  20000101 
yyyy  Maria    USA      Huston   avrnnfgs     1122255   bbbbb  20010101
cccc  Peter    FR       Paris    eesfawsd     3366552   ccccc  20020101

1 个答案:

答案 0 :(得分:0)

尝试使用pandas.merge

创建数据框:

temp = u"""id    name    country   town     customer_id
xxxx  Anna     UK       London   sahdghkl
yyyy  Maria    USA      Huston   avrnnfgs
cccc  Peter    FR       Paris    eesfawsd"""
data = pd.read_csv(io.StringIO(temp), header=0,delim_whitespace = 1)

temp = u"""customer_id  card_id   bank   date
sahdghkl     5975845   aaaaa  20000101
avrnnfgs     1122255   bbbbb  20010101
eesfawsd     3366552   ccccc  20020101"""
data2 = pd.read_csv(io.StringIO(temp), header=0,delim_whitespace = 1)

df = pd.merge(data,data2,on = 'customer_id')
print df

     id   name country    town customer_id  card_id   bank      date
0  xxxx   Anna      UK  London    sahdghkl  5975845  aaaaa  20000101
1  yyyy  Maria     USA  Huston    avrnnfgs  1122255  bbbbb  20010101
2  cccc  Peter      FR   Paris    eesfawsd  3366552  ccccc  20020101

如果您的两个数据框中有一个的行数多于另一个,并且您希望保留所有行,请添加how = 'outer',如果您只想保留数据框中出现的行,请添加:{{ 1}}