读取xml文件并使用特定字段输出到csv

时间:2020-02-20 17:02:44

标签: python xml formatting

我正在尝试将nvd cve xml文件格式化为csv文件以进行验证测试

这是xml文件的摘录

<?xml version="1.0" ?>
<all>
    <CVE_data_type type="str">CVE</CVE_data_type>
    <CVE_data_format type="str">MITRE</CVE_data_format>
    <CVE_data_version type="str">4.0</CVE_data_version>
    <CVE_data_numberOfCVEs type="str">6418</CVE_data_numberOfCVEs>
    <CVE_data_timestamp type="str">2020-01-26T08:30Z</CVE_data_timestamp>
    <CVE_Items type="list">
        <item type="dict">
            <cve type="dict">
                <data_type type="str">CVE</data_type>
                <data_format type="str">MITRE</data_format>
                <data_version type="str">4.0</data_version>
                <CVE_data_meta type="dict">
                    <ID type="str">CVE-2013-0001</ID>
                    <ASSIGNER type="str">cve@mitre.org</ASSIGNER>
                </CVE_data_meta>
                <problemtype type="dict">
                    <problemtype_data type="list">
                        <item type="dict">
                            <description type="list">
                                <item type="dict">
                                    <lang type="str">en</lang>
                                    <value type="str">CWE-200</value>
                                </item>
                            </description>
                        </item>
                    </problemtype_data>
                </problemtype>
                <references type="dict">
                    <reference_data type="list">
                        <item type="dict">
                            <url type="str">
                                https://docs.microsoft.com/en-us/security-updates/securitybulletins/2013/ms13-004
                            </url>
                            <name type="str">MS13-004</name>
                            <refsource type="str">MS</refsource>
                            <tags type="list"/>
                        </item>
                        <item type="dict">
                            <url type="str">
                                https://oval.cisecurity.org/repository/search/definition/oval%3Aorg.mitre.oval%3Adef%3A15814
                            </url>
                            <name type="str">oval:org.mitre.oval:def:15814</name>
                            <refsource type="str">OVAL</refsource>
                            <tags type="list"/>
                        </item>
                    </reference_data>
                </references>
                <description type="dict">
                    <description_data type="list">
                        <item type="dict">
                            <lang type="str">en</lang>
                            <value type="str">The Windows Forms (aka WinForms) component in Microsoft .NET Framework 1.0
                                SP3, 1.1 SP1, 2.0 SP2, 3.0 SP2, 4, and 4.5 does not properly initialize memory arrays,
                                which allows remote attackers to obtain sensitive information via (1) a crafted XAML
                                browser application (XBAP) or (2) a crafted .NET Framework application that leverages a
                                pointer to an unmanaged memory location, aka &quot;System Drawing Information Disclosure
                                Vulnerability.&quot;
                            </value>
                        </item>
                        <item type="dict">
                            <lang type="str">en</lang>
                            <value type="str">Per http://technet.microsoft.com/en-us/security/bulletin/ms13-004
                                Microsoft .NET Framework 3.0 Service Pack 2 is not vulnerable.
                            </value>
                        </item>
                    </description_data>
                </description>
            </cve>
            <configurations type="dict">
                <CVE_data_version type="str">4.0</CVE_data_version>
                <nodes type="list">
                    <item type="dict">
                        <operator type="str">AND</operator>
                        <children type="list">
                            <item type="dict">
                                <operator type="str">OR</operator>
                                <cpe_match type="list">
                                    <item type="dict">
                                        <vulnerable type="bool">True</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:a:microsoft:.net_framework:1.0:sp3:*:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                </cpe_match>
                            </item>
                            <item type="dict">
                                <operator type="str">OR</operator>
                                <cpe_match type="list">
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">
                                            cpe:2.3:o:microsoft:windows_xp:-:sp3:media_center:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_xp:-:sp3:tablet_pc:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                </cpe_match>
                            </item>
                        </children>
                    </item>
                    <item type="dict">
                        <operator type="str">AND</operator>
                        <children type="list">
                            <item type="dict">
                                <operator type="str">OR</operator>
                                <cpe_match type="list">
                                    <item type="dict">
                                        <vulnerable type="bool">True</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:a:microsoft:.net_framework:1.1:sp1:*:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                </cpe_match>
                            </item>
                            <item type="dict">
                                <operator type="str">OR</operator>
                                <cpe_match type="list">
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_server_2003:*:sp2:*:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">
                                            cpe:2.3:o:microsoft:windows_server_2008:*:sp2:x64:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">
                                            cpe:2.3:o:microsoft:windows_server_2008:*:sp2:x86:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">
                                            cpe:2.3:o:microsoft:windows_server_2008:-:sp2:itanium:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_vista:*:sp2:x64:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_vista:-:sp2:*:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_xp:*:sp3:*:*:*:*:*:*</cpe23Uri>
                                    </item>
                                    <item type="dict">
                                        <vulnerable type="bool">False</vulnerable>
                                        <cpe23Uri type="str">cpe:2.3:o:microsoft:windows_xp:-:sp2:x64:*:*:*:*:*
                                        </cpe23Uri>
                                    </item>
                                </cpe_match>

到目前为止,我只能提取以下代码中所示的几个萤幕,

import xmltodict

with open("nvd/nvdcve-1.1-2016.xml") as fd:
    doc = xmltodict.parse(fd.read())

    id = doc["all"]["CVE_Items"]["item"][0]["cve"]["CVE_data_meta"]["ID"]["#text"]
    desc = doc["all"]["CVE_Items"]["item"][0]["cve"]["description"]["description_data"][
        "item"
    ]["value"]["#text"]
    url = doc["all"]["CVE_Items"]["item"][0]["cve"]["references"]["reference_data"][
        "@list"
    ]["url"]
    v2complexity = doc["all"]["CVE_Items"]["item"][0]["cve"]["description"][
        "description_data"
    ]["item"]["value"]["#text"]
    v2vector = doc["all"]["CVE_Items"]["item"][0]["cve"]["description"][
        "description_data"
    ]["item"]["value"]["#text"]

当我到达url时,无论我如何尝试设置密钥,我都会不断收到此错误,

/usr/local/bin/python3.8 /Users/djobes/PycharmProjects/rainfall/read_xml-old.py Traceback (most recent call last): 
File "/Users/djobes/PycharmProjects/rainfall/read_xml-old.py", line 9, in <module> url = doc['all']['CVE_Items']['item'][0]['cve']['references']['reference_data']['@list']['url'] 
KeyError: '@list'

我将XML转换为csv,与json相同,因为几乎相同的错误,索引超出范围。

任何帮助或指针。

TIA

1 个答案:

答案 0 :(得分:0)

更改

array_merge()

id = doc["all"]["CVE_Items"]["item"][0]["cve"]["CVE_data_meta"]["ID"]["#text"]

让我正确建立索引,但我需要处理被告知的json文件。