我正在尝试将文件拆分成多个较小的文件,并且逻辑适用于没有lamdba的单个文件但是一旦我添加代码以从lambda触发,脚本在循环中运行而不完成并且错误地写入文件。
根据我的调试到目前为止,外部for循环尝试执行多次,即使只有一个文件启动了触发器
逻辑流程:
文件登陆/ bigfile /并且lamdba触发并尝试根据逻辑拆分文件并将小文件放在/ splitfiles /
中文件内容:
ABC | filename1.DAT | 123
CDE | filename2.DAT | 8910
XYZ | filename3.DAT | 456
FGH | filename4.DAT | 4545
O / P
File1中:
ABC | filename1.DAT | 123
CDE | filename2.DAT | 8910
文件2:
XYZ | filename3.DAT | 456
FGH | filename4.DAT | 4545
代码:
import boto3
import os
s3client = boto3.client('s3')
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print(key)
obj = s3.Object(bucket, key)
linesplit = obj.get()['Body'].read().split('\n')
lines_per_file=2 #number of lines per file
created_files = 0
sfilelines=''
for rownum,line in enumerate(linesplit,start=1):
sfilelines = sfilelines + '\n' + line
if rownum%lines_per_file == 0:
cnt = lines_per_file * (created_files + 1)
body_contents = str(sfilelines)
file_name = "%s_%s.DAT" % ('Testfile', cnt)
target_file = "splitfiles/" + file_name
print(target_file)
s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
Body=body_contents)
sfilelines = '' # Reset variables
created_files += 1 # One more small file has been created
if rownum: # to get the pending lines that is not written
cnt = lines_per_file * (created_files + 1)
body_contents = str(sfilelines)
file_name = "%s_%s.DAT" % ('Testfile', cnt)
target_file = "splitfiles/" + file_name
print(target_file)
s3client.put_object(ACL='public-read', ServerSideEncryption='AES256', Bucket=bucket, Key=target_file,
Body=body_contents)
created_files += 1
print ('%s split files (with <= %s lines each) were created.' % (created_files,lines_per_file))
except Exception as e:
print e
答案 0 :(得分:1)
根据您定义Lambda触发器的方式,每个文件可能会获得多个Lambda激活,即在不同的S3对象生命周期事件上。