尝试将json文件加载到展平的pandas DataFrame中

时间:2018-06-05 19:50:53

标签: python json pandas dictionary

我正在尝试将nist.gov中的json文件加载到没有嵌套dicts的pandas DataFrame中,这样我最终会在pandas DataFrame中使用扁平化记录。我可以使用嵌套列表,因为我将在以后堆叠和合并。目的是最终得到受影响产品的漏洞信息。

import pandas as pd

pd.set_option('display.max_colwidth', 80)  # set pandas column width to facilitate viewing
df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json.zip', compression='zip')  # load json file from nist

df中的值包括嵌套字典。

df.head(2)

  CVE_data_type CVE_data_format  CVE_data_version  CVE_data_numberOfCVEs CVE_data_timestamp                                                                        CVE_Items
0           CVE           MITRE                 4                    640  2018-06-05T18:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...
1           CVE           MITRE                 4                    640  2018-06-05T18:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', ...

当我将df.CVE_Items扩展为CVE_Items数据框时,我得到更多嵌套的dicts。

CVE_items = df.CVE_Items.apply(pd.Series)
CVE_items.head(2)
                                                                               cve                                                                   configurations                                                                           impact      publishedDate   lastModifiedDate
0  {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat...  {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl...  {'baseMetricV2': {'cvssV2': {'version': '2.0', 'vectorString': '(AV:N/AC:M/A...  2011-12-27T11:55Z  2018-06-04T13:46Z
1  {'data_type': 'CVE', 'data_format': 'MITRE', 'data_version': '4.0', 'CVE_dat...  {'CVE_data_version': '4.0', 'nodes': [{'operator': 'OR', 'cpe': [{'vulnerabl...  {'baseMetricV3': {'cvssV3': {'version': '3.0', 'vectorString': 'CVSS:3.0/AV:...  2018-04-24T20:29Z  2018-06-04T16:11Z

如果我继续扩展新形成的DataFrames,当我得到更多嵌套的dicts和/或带有嵌套dicts的列表时,图表会变粗。

cve = CVE_items.cve.apply(pd.Series)
configurations = CVE_items.configurations.apply(pd.Series)
impact = CVE_items.impact.apply(pd.Series)

cve.head(2)
  data_type data_format data_version                                         CVE_data_meta                                                                          affects                                                                      problemtype                                                                       references                                                                      description
0       CVE       MITRE          4.0  {'ID': 'CVE-2011-3841', 'ASSIGNER': 'cve@mitre.org'}  {'vendor': {'vendor_data': [{'vendor_name': 'wpsymposiumpro', 'product': {'p...     {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-79'}]}]}  {'reference_data': [{'url': 'http://secunia.com/advisories/47243', 'name': '...  {'description_data': [{'lang': 'en', 'value': 'Cross-site scripting (XSS) vu...
1       CVE       MITRE          4.0  {'ID': 'CVE-2013-3947', 'ASSIGNER': 'cve@mitre.org'}  {'vendor': {'vendor_data': [{'vendor_name': 'ahnlab', 'product': {'product_d...  {'problemtype_data': [{'description': [{'lang': 'en', 'value': 'CWE-119'}, {...  {'reference_data': [{'url': 'http://secunia.com/advisories/54465', 'name': '...  {'description_data': [{'lang': 'en', 'value': 'Buffer overflow in MedCoreD.s...

关于如何压扁此文件的任何想法?

1 个答案:

答案 0 :(得分:0)

事实证明,pandas提供了扩展嵌入式json对象所需的功能。

import pandas as pd
from pandas.io.json import json_normalize

df = pd.read_json('https://nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2019.json.zip', compression='zip')

df数据帧在df ['CVE_Items']中包含一个嵌入的json对象。

df.head
  CVE_data_type CVE_data_format  CVE_data_version  CVE_data_numberOfCVEs CVE_data_timestamp                                          CVE_Items
0           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
1           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
2           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
3           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...
4           CVE           MITRE                 4                   2510  2019-04-23T07:00Z  {'cve': {'data_type': 'CVE', 'data_format': 'M...

我使用json_normalize从扩展的json对象创建新的数据框。

df_CVE_Items = json_normalize(df['CVE_Items'])

df_CVE_Items.head()
  configurations.CVE_data_version                               configurations.nodes cve.CVE_data_meta.ASSIGNER cve.CVE_data_meta.ID                     cve.affects.vendor.vendor_data cve.data_format cve.data_type cve.data_version                   cve.description.description_data                   cve.problemtype.problemtype_data                      cve.references.reference_data impact.baseMetricV2.acInsufInfo impact.baseMetricV2.cvssV2.accessComplexity impact.baseMetricV2.cvssV2.accessVector impact.baseMetricV2.cvssV2.authentication impact.baseMetricV2.cvssV2.availabilityImpact  impact.baseMetricV2.cvssV2.baseScore impact.baseMetricV2.cvssV2.confidentialityImpact impact.baseMetricV2.cvssV2.integrityImpact impact.baseMetricV2.cvssV2.vectorString impact.baseMetricV2.cvssV2.version  impact.baseMetricV2.exploitabilityScore  impact.baseMetricV2.impactScore impact.baseMetricV2.obtainAllPrivilege impact.baseMetricV2.obtainOtherPrivilege impact.baseMetricV2.obtainUserPrivilege impact.baseMetricV2.severity impact.baseMetricV2.userInteractionRequired impact.baseMetricV3.cvssV3.attackComplexity impact.baseMetricV3.cvssV3.attackVector impact.baseMetricV3.cvssV3.availabilityImpact  impact.baseMetricV3.cvssV3.baseScore impact.baseMetricV3.cvssV3.baseSeverity impact.baseMetricV3.cvssV3.confidentialityImpact impact.baseMetricV3.cvssV3.integrityImpact impact.baseMetricV3.cvssV3.privilegesRequired impact.baseMetricV3.cvssV3.scope impact.baseMetricV3.cvssV3.userInteraction       impact.baseMetricV3.cvssV3.vectorString impact.baseMetricV3.cvssV3.version  impact.baseMetricV3.exploitabilityScore  impact.baseMetricV3.impactScore   lastModifiedDate      publishedDate
0                             4.0  [{'operator': 'OR', 'cpe_match': [{'vulnerable...              cve@mitre.org        CVE-2019-0001  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'Receipt of a malform...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                      MEDIUM                                 NETWORK                                      NONE                                      COMPLETE                                   7.1                                             NONE                                       NONE              AV:N/AC:M/Au:N/C:N/I:N/A:C                                2.0                                      8.6                              6.9                                  False                                    False                                   False                         HIGH                                       False                                        HIGH                                 NETWORK                                          HIGH                                   5.9                                  MEDIUM                                             NONE                                       NONE                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H                                3.0                                      2.2                              3.6  2019-02-14T18:35Z  2019-01-15T21:29Z
1                             4.0  [{'operator': 'OR', 'cpe_match': [{'vulnerable...              cve@mitre.org        CVE-2019-0002  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On EX2300 and EX3400...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                         LOW                                 NETWORK                                      NONE                                       PARTIAL                                   7.5                                          PARTIAL                                    PARTIAL              AV:N/AC:L/Au:N/C:P/I:P/A:P                                2.0                                     10.0                              6.4                                  False                                    False                                   False                         HIGH                                       False                                         LOW                                 NETWORK                                          HIGH                                   9.8                                CRITICAL                                             HIGH                                       HIGH                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H                                3.0                                      3.9                              5.9  2019-02-08T18:50Z  2019-01-15T21:29Z
2                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0003  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'When a specific BGP ...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                      MEDIUM                                 NETWORK                                      NONE                                       PARTIAL                                   4.3                                             NONE                                       NONE              AV:N/AC:M/Au:N/C:N/I:N/A:P                                2.0                                      8.6                              2.9                                  False                                    False                                   False                       MEDIUM                                       False                                        HIGH                                 NETWORK                                          HIGH                                   5.9                                  MEDIUM                                             NONE                                       NONE                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H                                3.0                                      2.2                              3.6  2019-02-07T15:52Z  2019-01-15T21:29Z
3                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0004                                                 []           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On Juniper ATP, the ...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'https://kb.juniper.net/JSA10918', 'n...                           False                                         LOW                                   LOCAL                                      NONE                                          NONE                                   2.1                                          PARTIAL                                       NONE              AV:L/AC:L/Au:N/C:P/I:N/A:N                                2.0                                      3.9                              2.9                                  False                                    False                                   False                          LOW                                       False                                         LOW                                   LOCAL                                          NONE                                   5.5                                  MEDIUM                                             HIGH                                       NONE                                           LOW                        UNCHANGED                                       NONE  CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:N/A:N                                3.0                                      1.8                              3.6  2019-01-29T16:40Z  2019-01-15T21:29Z
4                             4.0  [{'operator': 'AND', 'children': [{'operator':...              cve@mitre.org        CVE-2019-0005  [{'vendor_name': 'juniper', 'product': {'produ...           MITRE           CVE              4.0  [{'lang': 'en', 'value': 'On EX2300, EX3400, E...  [{'description': [{'lang': 'en', 'value': 'CWE...  [{'url': 'http://www.securityfocus.com/bid/106...                           False                                         LOW                                 NETWORK                                      NONE                                          NONE                                   5.0                                             NONE                                    PARTIAL              AV:N/AC:L/Au:N/C:N/I:P/A:N                                2.0                                     10.0                              2.9                                  False                                    False                                   False                       MEDIUM                                       False                                         LOW                                 NETWORK                                          NONE                                   5.3                                  MEDIUM                                             NONE                                        LOW                                          NONE                        UNCHANGED                                       NONE  CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N                                3.0                                      3.9                              1.4  2019-02-14T18:40Z  2019-01-15T21:29Z