使用Python中的boto3包从EC2实例访问S3中的zip文件和文件夹

时间:2018-07-26 23:09:14

标签: python xml amazon-web-services

我遇到了一个问题,我想查看一个要放在桶中的zip文件,它具有以下结构:

main.zip ==> main / folder1 / subfolder / file.xml ==> main / dolfer2 / file.xml

使用lambda函数,我需要在zip文件中捕获来自xmls文件的del路径,然后使用xml.dom.minidom对其进行解析

我正在这样做:

key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
bucket = urllib.unquote_plus(event['Records'][0]['s3']['bucket']['name'].encode('utf8'))
    try:
        obj = s3.get_object(Bucket=bucket, Key=key)
    with io.BytesIO(obj["Body"].read()) as tf:
        # rewind the file
        tf.seek(0)
        # Read the file as a zipfile and process the members
        with zipfile.ZipFile(tf, mode='r') as zipf:  
            for file in zipf.infolist():
                fileName = file.filename
                putFile = s3.put_object(Bucket=bucket, Key='Open/'+fileName, Body=zipf.read(file))
                print(putFile)

我该如何遍历所有zip并捕获zip上xmls文件的路径/对象(即使它们位于zip的任何文件夹内)以用xml.dom.minidom解析xml?

我必须查看en文件夹,并使用以下命令获取de xml文件: 从xml.dom.minidom导入解析,parseString 导入操作系统 导入boto3

def lambda_handler(event, context):
    # TODO implement
    s3= boto3.client('s3')
    datos = []
    if event:
        print('event:',event)
        file_obj = event['Records'][0]
        filename = str(file_obj['s3']['object']['key'])
        print('filename',filename)
        fileobj = s3.get_object(Bucket='mi-informacion-audit',Key=filename)
        file_content = fileobj[y]['Body'].read().decode('utf-8')
        #print(file_content)
        datos.append(file_content.replace('\n',''))
    for x in range(len(datos)):
        print(datos[x])
        doc = parseString(datos[x])
        nodeListCom = doc.getElementsByTagName('cfdi:Comprobante')
        for x in range(len(nodeListCom)):
            pk_xmlyear = nodeListCom[x].getAttribute('Fecha')[2:4]
            print(pk_xmlyear)
    return 'Hello from Lambda'

但是此文件夹包含10个xml文件,我的代码仅读取第一个...有帮助吗?

0 个答案:

没有答案