我正在为AWS lambda boto3苦苦挣扎: 我想逐行读取文件并在每行中替换专用表达式
s3 = boto3.client('s3')
def lambda_handler(event, context):
print(event)
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
obj = s3.get_object(Bucket=bucket, Key=key)
for text in obj['Body'].read().decode('utf-8').splitlines():
if "ABC" in text:
print(text)
代码运行良好,日志仅显示我感兴趣的行。现在我尝试替换该行中的某些表达式,但是“ replace或sub”确实起作用:
示例行:ABC <123> <abg 46547> <!ab123>
我想来:ABC_123_46547_ab123
boto3是否有任何正则表达式来替换行部分? 感谢您的帮助!
答案 0 :(得分:0)
除了您提供的一个示例外,您还没有指出任何特定的规则来替换字符串,因此我不得不猜测您的意图。
以下是几种选择。第一种是蛮力方法,仅执行文字替换。第二和第三种使用正则表达式来实现更通用和可扩展的方法。
import re
# in: ABC <123> <abg 46547> <!ab123>
# out: ABC_123_46547_ab123
#
# Need to substitute the following:
# " <abg " with "_"
# " <!" with "_"
# " <" with "_"
# ">" with ""
# ------------------------------------------------
# 1st option
# ------------------------------------------------
s1 = "ABC <123> <abg 46547> <!ab123>"
s2 = s1 \
.replace(" <abg ", "_") \
.replace(" <!", "_") \
.replace(" <", "_") \
.replace(">", "")
print("Option #1: literal")
print("\tbefore : {}".format(s1))
print("\tafter : {}".format(s2))
# ------------------------------------------------
# 2nd option
# ------------------------------------------------
s3 = s1
replacements_literal = [
(" <abg ", "_"),
(" <!", "_"),
(" <", "_"),
(">", "")
]
for old, new in replacements_literal:
s3 = re.sub(old, new, s3)
print("\nOption #2: literal, with loop")
print("\tbefore : {}".format(s1))
print("\tafter : {}".format(s3))
# ------------------------------------------------
# 3rd option
# ------------------------------------------------
s4 = s1
replacements_regex = [
(" *<[a-z]+ *", "_"),
(" *<!", "_"),
(" *<", "_"),
(">", "")
]
for old, new in replacements_regex:
s4 = re.sub(old, new, s4)
print("\nOption #3: regex, with loop")
print("\tbefore : {}".format(s1))
print("\tafter : {}".format(s4))
输出看起来像这样:
Option #1: literal
before : ABC <123> <abg 46547> <!ab123>
after : ABC_123_46547_ab123
Option #2: literal, with loop
before : ABC <123> <abg 46547> <!ab123>
after : ABC_123_46547_ab123
Option #3: regex, with loop
before : ABC <123> <abg 46547> <!ab123>
after : ABC_123_46547_ab123