当第二个csv具有相同的第一行时,在csv中添加行

时间:2016-12-16 16:31:02

标签: python csv

我正在学习Python,对于愚蠢的问题感到抱歉..

我有两个文件:

list.csv

john
mary
joanna
lucas
kate

db.csv

john^chief^portland
mary^secretary^ny
joanna^supervisor^washington

我想要实现的是比较两个文件和输出 按字母顺序排序的第一列abd,其名称不在db中,在第二列中添加None,如下所示:

output.csv

joanna^supervisor^washington
john^chief^portland
kate^None
lucas^None
Mary^secretary^ny

我从这个代码开始与它斗争,我在SO上找到了:

masterlist = list(reader22)

for hosts_row in reader21:
    row = 1
    found = False
    for master_row in masterlist:
        results_row = hosts_row
        if hosts_row[0] == master_row[0]:
            results_row.append('FOUNDTHISLINE in master list (row '
                               + str(row) + ')')
            found = True
            break
        row = row + 1
    if not found:
        results_row.append('THISLINENOTFOUND in master list')
    writer23.writerow(results_row)

请帮助理解如何以最佳方式完成。

2 个答案:

答案 0 :(得分:2)

这是Pandas图书馆的完美案例。我知道你只是在学习,但要检查数据操作(请忽略编号:))

In [37]: list_df = pd.read_csv('list.csv', header=None)

In [38]: db_df = pd.read_csv('db.csv', sep='^', header=None)

In [51]: db_df
Out[51]:
        0           1           2
0    john       chief    portland
1    mary   secretary          ny
2  joanna  supervisor  washington


In [48]: list_df
Out[48]:
        0
0    john
1    mary
2  joanna
3   lucas
4    kate

In [52]: df = list_df.merge(db_df, how='left')

In [53]: df
Out[53]:
        0           1           2
0    john       chief    portland
1    mary   secretary          ny
2  joanna  supervisor  washington
3   lucas         NaN         NaN
4    kate         NaN         NaN

In [54]: df.sort(0)
Out[54]:
        0           1           2
2  joanna  supervisor  washington
0    john       chief    portland
4    kate         NaN         NaN
3   lucas         NaN         NaN
1    mary   secretary          ny

从那里你可以调用df.to_csv函数并获得你正在寻找的输出。

(回写) http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

答案 1 :(得分:2)

只使用csv模块和Python自己的内置数据结构(例如列表和词典)来执行您想要的操作非常简单有效:

import csv

with open('list.csv', 'rb') as csvfile:
    masterlist = sorted(row[0] for row in csv.reader(csvfile))

with open('db.csv', 'rb') as csvfile:
    db = {row[0]: row[1:] for row in csv.reader(csvfile, delimiter='^')}

with open('output.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter='^')
    for name in masterlist:
        writer.writerow([name] + db[name] if name in db else [name, 'None', ''])

创建output.csv的内容:

joanna^supervisor^washington
john^chief^portland
kate^None^
lucas^None^
mary^secretary^ny