使用Python

时间:2017-11-28 12:09:29

标签: python csv export-to-csv

我正在开发一个Python脚本,它将Nessus数据导出为CSV并删除重复数据,但是由于导出工作结果的方式不同的端口和协议有自己独特的行,即使所有其他行中的数据是相同的。我需要删除这些重复项,但我想保留Port和Protocol列数据并将其附加到上一行。

这是一个非常小的CSV,用于测试和构建脚本:

Screenshot of CSV File

正如您所看到的,除了端口字段之外,所有字段都是完全相同的,有时协议字段也会不同,所以我需要读取CSV文件的两行,然后像这样添加端口:80,443与协议相同:tcp,tcp

然后只保存一行以删除重复数据,我已经尝试通过检查是否已经存在插件ID的实例来执行此操作,但是我的输出仅打印第二行Port和Protocol。

protocollist = []
portlist = []
pluginid_list = []
multiple = False 

with open(csv_file_input, 'rb') as csvfile:
    nessusreader = csv.DictReader(csvfile)
    for row in nessusreader:
        pluginid = row['Plugin ID']
        if pluginid != '':
            pluginid_list.append(row['Plugin ID'])
            print(pluginid_list)
        count = pluginid_list.count(pluginid)
        cve = row['CVE']
        if count > 0:
            protocollist.append(row['Protocol'])
            print(protocollist)
            portlist.append(row['Port'])
            print(portlist)
            print('Counted more than 1')
            multiple = True
        if multiple == True:
            stringlist = ', '.join(protocollist)
            newstring1 = stringlist
            protocol = newstring1
            stringlist2 = ', '.join(portlist)
            newstring2 = stringlist2
            port = newstring2
        else:
            protocol = row['Protocol']
            port = row['Port']
        cvss = row['CVSS']
        risk = row['Risk']
        host = row['Host']
        name = row['Name']
        synopsis = row['Synopsis']
        description = row['Description']
        solution = row['Solution']
        seealso = row['See Also']
        pluginoutput = row['Plugin Output']

with open(csv_file_output, 'w') as csvfile:
    fieldnames = ['Plugin ID', 'CVE', 'CVSS', 'Risk', 'Host', 'Protocol', 'Port', 'Name', 'Synopsis', 'Description', 'Solution', 'See Also', 'Plugin Output']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'Plugin ID': pluginid, 'CVE': cve, 'CVSS': cvss, 'Risk': risk, 'Host': host, 'Protocol': protocol, 'Port': port, 'Name': name, 'Synopsis': synopsis, 'Description': description, 'Solution': solution, 'See Also': seealso, 'Plugin Output': pluginoutput})

代码中可能存在一些错误,因为我一直在尝试不同的事情,但只是想展示我一直在努力为此问题提供更多背景信息的代码。如果数据仅在CSV中显示,因为只有两个项目,此代码有效,但是我引入了具有不同插件ID的第三组数据,然后将其添加到列表中,可能是由于if语句是设为> 0

0 个答案:

没有答案