我有一个包含许多多行记录的结构化文本文件。每条记录都应该有一个关键的唯一字段。我需要阅读一系列这些文件,查找非唯一键字段并用唯一值替换键值。
我的脚本正在识别需要替换的所有字段。我将这些字段存储在字典中,其中键是非唯一字段,值是唯一值列表。
例如:
{
"1111111111" : ["1234566363", "5533356775", "6443458343"]
}
我想做的是只阅读每个文件一次,找到" 1111111111" (dict键)并用第一个键值替换第一个匹配,用第二个键值替换第二个匹配值等。
我正在尝试使用正则表达式,但我不确定如何在不多次循环文件的情况下构造合适的RE
这是我目前的代码:
def multireplace(Text, Vars):
dictSorted = sorted(Vars, key=len, reverse=True)
regEx = re.compile('|'.join(map(re.escape, dictSorted)))
return regEx.sub(lambda match: Vars[match.group(0)], Text)
text = multireplace(text, find_replace_dict)
它适用于单个键:值组合,但如果:value是列表,则无法编译:
return regEx.sub(lambda match: Vars[match.group(0)], Text , 1)
TypeError: sequence item 1: expected str instance, list found
可以在不通过文件循环多次的情况下更改功能吗?
答案 0 :(得分:1)
看看并阅读评论。如果有什么事情没有意义,请告诉我:
import re
def replace(text, replacements):
# Make a copy so we don't destroy the original.
replacements = replacements.copy()
# This is essentially what you had already.
regex = re.compile("|".join(map(re.escape, replacements.keys())))
# In our lambda, we pop the first element from the array. This way,
# each time we're called with the same group, we'll get the next replacement.
return regex.sub(lambda m: replacements[m.group(0)].pop(0), text)
print(replace("A A B B A B", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))
# Output:
# A1 A2 B1 B2 A3 B3
<强>更新强>
要在下面的评论中帮助解决此问题,请尝试使用此版本,该版本将准确告诉您哪个字符串用完了替代品:
import re
def replace(text, replacements):
# Let's make a method so we can do a little more than the lambda.
def make_replacement(match):
try:
return replacements[match.group(0)].pop(0)
except IndexError:
# Print out debug info about what happened
print("Ran out of replacements for {}".format(match.group(0)))
# Re-raise so the process still exits.
raise
# Make a copy so we don't destroy the original.
replacements = replacements.copy()
# This is essentially what you had already.
regex = re.compile("|".join(map(re.escape, replacements.keys())))
# In our lambda, we pop the first element from the array. This way,
# each time we're called with the same group, we'll get the next replacement.
return regex.sub(make_replacement, text)
print(replace("A A B B A B A", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))
# Output:
# A1 A2 B1 B2 A3 B3