我遇到了一个问题,我想查看一个要放在桶中的zip文件,它具有以下结构:
main.zip ==> main / folder1 / subfolder / file.xml ==> main / dolfer2 / file.xml
使用lambda函数,我需要在zip文件中捕获来自xmls文件的del路径,然后使用xml.dom.minidom对其进行解析
我正在这样做:
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
bucket = urllib.unquote_plus(event['Records'][0]['s3']['bucket']['name'].encode('utf8'))
try:
obj = s3.get_object(Bucket=bucket, Key=key)
with io.BytesIO(obj["Body"].read()) as tf:
# rewind the file
tf.seek(0)
# Read the file as a zipfile and process the members
with zipfile.ZipFile(tf, mode='r') as zipf:
for file in zipf.infolist():
fileName = file.filename
putFile = s3.put_object(Bucket=bucket, Key='Open/'+fileName, Body=zipf.read(file))
print(putFile)
我该如何遍历所有zip并捕获zip上xmls文件的路径/对象(即使它们位于zip的任何文件夹内)以用xml.dom.minidom解析xml?
我必须查看en文件夹,并使用以下命令获取de xml文件: 从xml.dom.minidom导入解析,parseString 导入操作系统 导入boto3
def lambda_handler(event, context):
# TODO implement
s3= boto3.client('s3')
datos = []
if event:
print('event:',event)
file_obj = event['Records'][0]
filename = str(file_obj['s3']['object']['key'])
print('filename',filename)
fileobj = s3.get_object(Bucket='mi-informacion-audit',Key=filename)
file_content = fileobj[y]['Body'].read().decode('utf-8')
#print(file_content)
datos.append(file_content.replace('\n',''))
for x in range(len(datos)):
print(datos[x])
doc = parseString(datos[x])
nodeListCom = doc.getElementsByTagName('cfdi:Comprobante')
for x in range(len(nodeListCom)):
pk_xmlyear = nodeListCom[x].getAttribute('Fecha')[2:4]
print(pk_xmlyear)
return 'Hello from Lambda'
但是此文件夹包含10个xml文件,我的代码仅读取第一个...有帮助吗?