我有两个CSV文件,如:
first.csv:
1,A B C
2,A D
3,T Q
second.csv:
1,
2,P A
3,
4,A O
是否可以使用pandas将这两个CSV连接成类似格式的CSV?
输出CSV应该是:
1,A B C
2,A D P
3,T Q
4,A O
答案 0 :(得分:1)
在两个CSV的每一行上循环,并组成一行set
:
import csv
with open('1.csv') as f1:
with open('2.csv') as f2:
with open('output.csv', 'w') as outfile:
second_csv = csv.reader(f2.readlines())
for first_row in csv.reader(f1.readlines()):
second_row = second_csv.next()
outfile.write(set(first_row+second_row)
答案 1 :(得分:1)
尝试:
import pandas as pd
first = pd.DataFrame('first.csv')
second = pd.DataFrame('second.csv')
third = pd.merge(first,second, how='inner')
Pandas是加载csv数据并在以后操作它的王者。
答案 2 :(得分:0)
这是一个做你想要的;它不使用Pandas,但它也不假设文件行按任何特定顺序排列。
import csv
from itertools import chain
from collections import defaultdict
def load_csv(fname):
with open(fname, 'rb') as inf:
in_csv = csv.reader(inf)
for row in in_csv:
yield row
def write_csv(fname, rows):
with open(fname, 'wb') as outf:
csv.writer(outf).writerows(rows)
def main():
# load data from several .csv files
files = ['first.csv', 'second.csv']
data = defaultdict(set)
for key,items in chain(*(load_csv(f) for f in files)):
data[key].update(items.split())
# reorder data for output
rows = ([key, ' '.join(sorted(data[key]))] for key in sorted(data.keys()))
# write merged .csv file
write_csv('output.csv', rows)
if __name__=="__main__":
main()