比较两个CSV文件并搜索类似的项目

时间:2016-08-09 22:10:54

标签: python csv concatenation

我还是Python的新手,我正在努力调整此代码,以便从this post为我工作。

该帖子和我正在寻找的内容之间的区别在于,当匹配的签名&#39时,我希望连接来自hosts.csv和masterlist.csv的匹配行的全部内容。 ;在两个文件中都可以找到。

所以如果hosts.csv看起来像这样:

Path    Filename    Size    Signature
C:\     a.txt       14kb    012345
D:\     b.txt       99kb    678910
C:\     c.txt       44kb    111213

masterlist.csv看起来像这样:

Signature    Name    State
012345       Joe     CT
567890       Sue     MA
111222       Dan     MD

修补Martijn Pieters在回复Serk的帖子时发布的代码,他的代码让我大部分都在那里。

import time, csv
timestr = time.strftime("%Y%m%d_%H%M")
outputfile = "Results_" + (timestr) + ".csv"

    with open('masterlist.csv', 'rb') as master:
        master_indices = dict((r[0], i) for i, r in enumerate(csv.reader(master)))

    with open('hosts.csv', 'rb') as hosts:
        with open('results.csv', 'wb') as results:    
            reader = csv.reader(hosts)
            writer = csv.writer(results)

            writer.writerow(next(reader, []) + ['RESULTS'])

            for row in reader:
                index = master_indices.get(row[3])
                if index is not None:
                    message = 'FOUND in (row {})'.format(index)
                else:
                    message = 'NOT FOUND'
                writer.writerow(row + [message])

而不是像Serk一样寻找匹配签名时添加RESULTS列,如何从masterlist.csv和hosts.csv文件中提取相应的行,并在results.csv文件中将两者连接在一起?所需的输出文件如下所示:

Path    Filename    Size    RESULTS          Signature    Name  State    
C:\     a.txt       14kb    FOUND in Row 1   012345       Joe   CT
D:\     b.txt       99kb    FOUND in Row 2   678910       Sue   MA
C:\     c.txt       44kb    NOT FOUND        111213

在此先感谢,此处的回复已经帮助我解决了我一直在寻找的大多数解决方案!

1 个答案:

答案 0 :(得分:3)

使用pandas.read_csv并合并“签名”列

import pandas as pd

hosts_df = pd.read_csv("hosts.csv ")
masterlist_df = pd.read_csv("masterlist.csv")
results = masterlist_df.merge(hosts_df, on="Signature", how="outer")
results.to_csv("results.csv")