按ID加入两个CSV

时间:2013-12-05 23:51:33

标签: python csv pandas

我有两个CSV文件,如:

first.csv:

1,A B C
2,A D
3,T Q

second.csv:

1,
2,P A
3,
4,A O

是否可以使用pandas将这两个CSV连接成类似格式的CSV?

输出CSV应该是:

1,A B C
2,A D P
3,T Q
4,A O

3 个答案:

答案 0 :(得分:1)

在两个CSV的每一行上循环,并组成一行set

import csv

with open('1.csv') as f1:
 with open('2.csv') as f2:
  with open('output.csv', 'w') as outfile:

   second_csv = csv.reader(f2.readlines())
   for first_row in csv.reader(f1.readlines()):
     second_row = second_csv.next()
     outfile.write(set(first_row+second_row)

答案 1 :(得分:1)

尝试:

import pandas as pd
first = pd.DataFrame('first.csv')
second = pd.DataFrame('second.csv')
third = pd.merge(first,second,  how='inner')

Pandas是加载csv数据并在以后操作它的王者。

答案 2 :(得分:0)

这是一个做你想要的;它不使用Pandas,但它也不假设文件行按任何特定顺序排列。

import csv
from itertools import chain
from collections import defaultdict

def load_csv(fname):
    with open(fname, 'rb') as inf:
        in_csv = csv.reader(inf)
        for row in in_csv:
            yield row

def write_csv(fname, rows):
    with open(fname, 'wb') as outf:
        csv.writer(outf).writerows(rows)

def main():
    # load data from several .csv files
    files = ['first.csv', 'second.csv']
    data = defaultdict(set)
    for key,items in chain(*(load_csv(f) for f in files)):
        data[key].update(items.split())

    # reorder data for output
    rows = ([key, ' '.join(sorted(data[key]))] for key in sorted(data.keys()))

    # write merged .csv file
    write_csv('output.csv', rows)

if __name__=="__main__":
    main()