Python:比较2个csv文件中的3列,如果它们相等则输出

时间:2017-09-25 15:08:55

标签: python csv python-3.5

所以我有两个CSV文件,我试图比较并获得类似项目的结果。第一个文件hosts.csv如下所示:

Path    Filename    Size    Signature
C:\     a.txt       14kb    012345
D:\     b.txt       99kb    678910
C:\     c.txt       44kb    111213

第二个文件masterlist.csv如下所示:

Filename    Signature
b.txt       678910
x.txt       111213
b.txt       777777
c.txt       999999

正如您所看到的行不匹配,并且masterlist.csv始终大于hosts.csv文件。我唯一想搜索的部分是签名部分。我知道这看起来像是:

主机[3] ==主列表[1] 我正在寻找一个解决方案,它会给我类似下面的东西(基本上是带有新的RESULTS列的hosts.csv文件):

Path    Filename    Size    Signature    RESULTS
C:\     a.txt       14kb    012345       NOT FOUND in masterlist
D:\     b.txt       99kb    678910       FOUND in masterlist (row 1)
C:\     c.txt       44kb    111213       FOUND in masterlist (row 2)

我已经搜索了帖子,发现了类似的内容,但我还是不太了解它,因为我还在学习python。

使用Python 3.5编辑

3 个答案:

答案 0 :(得分:0)

你可以试试这个:

import csv
masterlist = list(csv.reader(open('masterlist.csv')))
host = list(csv.reader(open('host.csv')))
masterlist_dict = {a:b for a, b in zip(["Filename", "Signature"], masterlist)}
final_result = [["Path", "Filename", "Size","Signature", "RESULTS"]] +
               [[path, filename, size, signature, "NOT FOUND"] 
                if signature in masterlist_dict["Signature"] 
                else [path, filename, size, signature, 
                      "FOUND (row {})".format(
                         masterlist_dict["Signature"].index(signature) 
                      for path, filename, size, signature in host]
write = csv.writer(open("new_host.csv", 'a')))
write.writerows(final_results)

答案 1 :(得分:0)

使用csv.DictWriterimport csv with open('hosts.csv', 'r') as hosts, open('masterlist.csv', 'r') as mlist, \ open('result.csv', 'w', newline='') as res: host_reader = csv.DictReader(hosts, delimiter=' ', skipinitialspace=True) mlist_reader = csv.DictReader(mlist, delimiter=' ', skipinitialspace=True) writer = csv.DictWriter(res, fieldnames=host_reader.fieldnames + ['Result'], delimiter='\t') mlist_data = {r['Signature']: mlist_reader.line_num-1 for r in mlist_reader} fmt = '{0}FOUND in masterlist{1}' # prepearing output format for `Result` field writer.writeheader() # writing header for r in host_reader: if r['Signature'] in mlist_data: r['Result'] = fmt.format(""," (row "+str(mlist_data[r['Signature']])+")") else: r['Result'] = fmt.format("NOT ","") writer.writerow(r) 个对象的解决方案:

result.csv

Path Filename Size Signature Result C:\ a.txt 14kb 012345 NOT FOUND in masterlist D:\ b.txt 99kb 678910 FOUND in masterlist (row 1) C:\ c.txt 44kb 111213 FOUND in masterlist (row 2) 内容:

vagrant local-status

答案 2 :(得分:0)

我总是喜欢使用pandas数据框来完成这些工作,因为它提供了各种不同的功能来保存和编辑.csv - 文件。 Pandas

df = pd.DataFrame.from_csv('1.csv')
df2 = pd.DataFrame.from_csv('2.csv')
df['result'] = 0
for i in xrange(df['signature'].__len__()):
    for j in xrange(df2['signature'].__len__()):
        if df['signature'][i] == df2['signature'][j]:
            df.loc[i, ('result')] = 'found in \'2.csv\' at row ' + str(
                df2.signature[df2.signature == df2['signature'][j]].index.tolist())
            break
df.to_csv('out.csv')

1.csv = hosts.csv2.csv = masterlist.csv,并将整个输出保存为out.csv。输出如下:

  path filename  signature                          result
0  C:\    a.txt      12345                               0
1  D:\    b.txt     678910     found in '2.csv' at row [0]
2  C:\    c.txt     111213  found in '2.csv' at row [1, 4]

和我的.csv - 文件如下所示。

首先1.csv

  path filename  signature
0  C:\    a.txt      12345
1  D:\    b.txt     678910
2  C:\    c.txt     111213

第二名: 2.csv

  filename  signature
0    b.txt     678910
1    x.txt     111213
2    b.txt     777777
3    c.txt     999999
4    b.txt     111213

所以我可以看到2.csv中签名是否存在多次出现,并保存在哪里找到它们。