我有一个以下格式的输入文件:
<ftnt>
<p><su>1</su> aaaaaaaaaaa </p>
</ftnt>
...........
...........
...........
... the <su>1</su> is availabe in the .........
我需要通过替换值并删除ftnt
标记中的整个数据,将其转换为以下格式:
"""...
...
... the aaaaaaaaaaa is available in the ..........."""
请找到我写的代码。最初我保存了钥匙和钥匙。字典中的值,并尝试使用分组替换基于键的值。
import re
dict = {}
in_file = open("in.txt", "r")
outfile = open("out.txt", "w")
File1 = in_file.read()
infile1 = File1.replace("\n", " ")
for mo in re.finditer(r'<p><su>(\d+)</su>(.*?)</p>',infile1):
dict[mo.group(1)] = mo.group(2)
subval = re.sub(r'<p><su>(\d+)</su>(.*?)</p>','',infile1)
subval = re.sub('<su>(\d+)</su>',dict[\\1], subval)
outfile.write(subval)
我尝试在re.sub
中使用字典,但我得到KeyError
。我不知道为什么会发生这种情况,请你告诉我如何使用。我很感激这里有任何帮助。
答案 0 :(得分:0)
首先,不要命名词典dict
,否则您将销毁dict
功能。其次,\\1
不能在字符串之外工作,因此语法错误。我认为最好的办法是利用str.format
import re
# store the substitutions
subs = {}
# read the data
in_file = open("in.txt", "r")
contents = in_file.read().replace("\n", " ")
in_file.close()
# save some regexes for later
ftnt_tag = re.compile(r'<ftnt>.*</ftnt>')
var_tag = re.compile(r'<p><su>(\d+)</su>(.*?)</p>')
# pull the ftnt tag out
ftnt = ftnt_tag.findall(contents)[0]
contents = ftnt_tag.sub('', contents)
# pull the su
for match in var_tag.finditer(ftnt):
# added s so they aren't numbers, useful for format
subs["s" + match.group(1)] = match.group(2)
# replace <su>1</su> with {s1}
contents = re.sub(r"<su>(\d+)</su>", r"{s\1}", contents)
# now that the <su> are the keys, we can just use str.format
out_file = open("out.txt", "w")
out_file.write( contents.format(**subs) )
out_file.close()