基于此论坛Replacing a line in a file based on a keyword search, by line from another file我在我的真实档案中遇到了一点困难。如下图所示,我想搜索关键字" PBUSH后跟数字(不断增加)"并且基于该关键字,如果存在或不存在,则在另一个文件中搜索。如果它存在,则替换行中的数据" PBUSH数字K一些小数"到另一个文件中找到的行,保持搜索关键字相同。它一直持续到文件末尾,看起来像
和我修改的代码(请注意findall和sub格式)如下所示:
import re
path1 = "C:\Users\sony\Desktop\PBUSH1.BDF"
path2 = "C:\Users\sony\Desktop\PBUSH2.BDF"
with open(path1) as f1, open(path2) as f2:
dat1 = f1.read()
dat2 = f2.read()
matches = re.findall('^PBUSH \s [0-9] \s K [0-9 ]+', dat1, flags=re.MULTILINE)
for match in matches:
dat2 = re.sub('^{} \s [0-9] \s K \s'.format(match.split(' ')[0]), match, dat2, flags=re.MULTILINE)
with open(path2, 'w') as f:
f.write(dat2)
这里我的搜索关键字是PBUSH空格数,然后其余部分如PBUSH行所示。我无法使它工作。可能是什么原因!
答案 0 :(得分:0)
最好在这种情况下使用组,并将整个字符串分成两个,一个用于匹配短语,另一个用于数据。
import re
# must use raw strings for paths, otherwise we need to
# escape \ characters
input1 = r"C:\Users\sony\Desktop\PBUSH1.BDF"
input2 = r"C:\Users\sony\Desktop\PBUSH2.BDF"
with open(input1) as f1, open(input2) as f2:
dat1 = f1.read()
dat2 = f2.read()
# use finditer instead of findall so that we will get
# a match object for each match.
# For each matching line we also have one subgroup, containing the
# "PBUSH NNN " part, whereas the whole regex matches until
# the next end of line
matches = re.finditer('^(PBUSH\s+[0-9]+\s+).*$', dat1, flags=re.MULTILINE)
for match in matches:
# for each match we construct a regex that looks like
# "^PBUSH 123 .*$", then replace all matches thereof
# with the contents of the whole line
dat2 = re.sub('^{}.*$'.format(match.group(1)), match.group(0), dat2, flags=re.MULTILINE)
with open(input2, 'w') as outf:
outf.write(dat2)