逐行读取文件并在AWS Lambda中用boto3替换零件

时间:2018-11-23 16:17:35

标签: amazon-web-services replace aws-lambda boto3

我正在为AWS lambda boto3苦苦挣扎: 我想逐行读取文件并在每行中替换专用表达式

s3 = boto3.client('s3')

def lambda_handler(event, context):

print(event)

bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']

obj = s3.get_object(Bucket=bucket, Key=key)

for text in obj['Body'].read().decode('utf-8').splitlines():
    if "ABC" in text:
        print(text)

代码运行良好,日志仅显示我感兴趣的行。现在我尝试替换该行中的某些表达式,但是“ replace或sub”确实起作用:

示例行:ABC <123> <abg 46547> <!ab123>

我想来:ABC_123_46547_ab123

boto3是否有任何正则表达式来替换行部分? 感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

除了您提供的一个示例外,您还没有指出任何特定的规则来替换字符串,因此我不得不猜测您的意图。

以下是几种选择。第一种是蛮力方法,仅执行文字替换。第二和第三种使用正则表达式来实现更通用和可扩展的方法。

import re

# in:  ABC  <123>  <abg 46547>  <!ab123>
# out: ABC_123_46547_ab123
#
# Need to substitute the following:
# "  <abg " with "_"
# "  <!" with "_"
# "  <" with "_"
# ">" with ""

# ------------------------------------------------
# 1st option
# ------------------------------------------------
s1 = "ABC  <123>  <abg 46547>  <!ab123>"

s2 = s1 \
    .replace("  <abg ", "_") \
    .replace("  <!", "_") \
    .replace("  <", "_") \
    .replace(">", "")

print("Option #1: literal")
print("\tbefore : {}".format(s1))
print("\tafter  : {}".format(s2))

# ------------------------------------------------
# 2nd option
# ------------------------------------------------
s3 = s1

replacements_literal = [
    ("  <abg ", "_"),
    ("  <!", "_"),
    ("  <", "_"),
    (">", "")
]

for old, new in replacements_literal:
    s3 = re.sub(old, new, s3)

print("\nOption #2: literal, with loop")
print("\tbefore : {}".format(s1))
print("\tafter  : {}".format(s3))

# ------------------------------------------------
# 3rd option
# ------------------------------------------------
s4 = s1

replacements_regex = [
    (" *<[a-z]+ *", "_"),
    (" *<!", "_"),
    (" *<", "_"),
    (">", "")
]

for old, new in replacements_regex:
    s4 = re.sub(old, new, s4)

print("\nOption #3: regex, with loop")
print("\tbefore : {}".format(s1))
print("\tafter  : {}".format(s4))

输出看起来像这样:

Option #1: literal
        before : ABC  <123>  <abg 46547>  <!ab123>
        after  : ABC_123_46547_ab123

Option #2: literal, with loop
        before : ABC  <123>  <abg 46547>  <!ab123>
        after  : ABC_123_46547_ab123

Option #3: regex, with loop
        before : ABC  <123>  <abg 46547>  <!ab123>
        after  : ABC_123_46547_ab123