Python Boto3 - 数据未正确写入DynamoDB

时间:2018-02-27 15:54:56

标签: python xml amazon-dynamodb

我有一个XML文件我正在使用Python解析字符串并在AWS中写入DynamoDB表。标签为<IMAGE_ID><CVSS_FINAL>。当我遍历并print()这些值时,它将返回所有这些值。但是,当我写入Dynamo时,只写入一行数据。所以,我不明白为什么print()会返回所有内容,但只有一行写入数据存储区。

代码:

import boto3
import lxml
from lxml import etree

def WriteItemToTable():
    s3 = boto3.resource('s3')
    bucket = ‘xxxxxxxxxxxx’
    key = 'vuln_data.xml'
    dynamo = boto3.client('dynamodb')

    obj = s3.Object(‘xxxxxxxxxx’, 'vuln_data.xml')
    body = obj.get()['Body'].read()

    image_id = etree.fromstring(body).findall('HOST_LIST/HOST/EC2_INFO/IMAGE_ID')
    risk_score = etree.fromstring(body).findall('HOST_LIST/HOST/VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')

    for el in image_id:
        i = el.text
        print(i)

    for el in risk_score:
        j = el.text
        print(j)

    response = dynamo.put_item(
    TableName='ExistingAMI',
        Item={
            'AMI_ID': {
                'S': i
             },
            'CVSS_SCORE': {
                'S': j
            },
         }
       )

WriteItemToTable()

XML:

<HOST_LIST>
    <HOST>
      <EC2_INFO>
        <PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
        <IMAGE_ID><![CDATA[ami-xxxxxx]]></IMAGE_ID>
      </EC2_INFO>
      <OPERATING_SYSTEM><![CDATA[Linux x.y]]></OPERATING_SYSTEM>
      <VULN_INFO_LIST>
        <VULN_INFO>
          <QID id="qid_x”>x</QID>
          <TYPE>Vuln</TYPE>
          <CVSS_FINAL>3.5</CVSS_FINAL>
          <RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
        </VULN_INFO>
        <VULN_INFO>
      <QID id="qid_xxxx">xxxxx</QID>
      <CVSS_FINAL>2.1</CVSS_FINAL>
    </VULN_INFO>
    <VULN_INFO>
      <QID id="qid_xxxx">xxxx</QID>
      <CVSS_FINAL>4.3</CVSS_FINAL>
      <RESULT><![CDATA[TLSv1.0 is supported]]></RESULT>
    </VULN_INFO>
    </VULN_INFO_LIST>
    </HOST>
    <HOST>
      <EC2_INFO>
        <PUBLIC_DNS_NAME><![CDATA[ec2-xxxxxxxxx.compute-1.amazonaws.com]]></PUBLIC_DNS_NAME>
        <IMAGE_ID><![CDATA[ami-yyyyyy]]></IMAGE_ID>
      </EC2_INFO>
      <OPERATING_SYSTEM><![CDATA[Amazon Linux]]></OPERATING_SYSTEM>
      <VULN_INFO_LIST>
        <VULN_INFO>
          <QID id=“x”>x</QID>
          <CVSS_FINAL>3.6</CVSS_FINAL>
        </VULN_INFO>
    </VULN_INFO_LIST>
    </HOST>
</HOST_LIST>

print()输出:

ami-xxxxxx
ami-yyyyyy
3.5
3.6

发布表:

Dynamo Table

1 个答案:

答案 0 :(得分:0)

虽然我对DynamoDB一无所知,但您的Python代码只应传递一个 i j 值,因为您的dynamo.put_item代码块未嵌套在for循环,因此获取它们最后指定的值。

只需在一个可以嵌套在<HOST>级别的循环中一起运行 image_id risk_score 搜索。并考虑xpath()中提供的lxml。 <= 1}}调用时,您无需导入其方法import lxml

etree

对于多个 CVSS_FINAL ,使用XPath解析为doc = etree.fromstring(body) # PARSE ONLY ONCE hosts = doc.xpath('//HOST') for h in hosts: i = h.xpath('EC2_INFO/IMAGE_ID')[0].text print(i) j = h.xpath('VULN_INFO_LIST/VULN_INFO/CVSS_FINAL')[0].text print(j) response = dynamo.put_item( TableName='ExistingAMI', Item={ 'AMI_ID': { 'S': i }, 'CVSS_SCORE': { 'S': j }, } ) WriteItemToTable() # ami-xxxxxx # 3.5 # ami-yyyyyy # 3.6 ,然后使用<CSVSS_FINAL>

检索相应的 IMAGE_ID
ancestor::*