Question

这是我的第一个问题，我已经进行了研究，但找不到真正相似的东西。

脚本的主要目标是：我希望它针对正则表达式扫描文本文件中的所有行。如果匹配，则将当前行和增量索引添加到字典中。在EOF，现在已填充的字典应写入一个新文件。

当前问题：在运行for循环来扫描行时，尽管扫描程序实际上找到了多个匹配项，但字典似乎从未获得过多个条目（当匹配为true时，通过简单的打印语句确认。我错过了什么？

for inputfile in inputfiles:
print("Processing "+ inputfile)

inputfile = os.path.join(filespath,inputfile)

with open (inputfile, "r", encoding="UTF-8") as infile:
    alllines = infile.readlines()

matched_lines = {}
int_index = 1
indexer = str(int_index).zfill(5)
for line in alllines:
    if re.search(match_string,line,flags=0):
        matched_lines[indexer] = line
        int_index += 1
print (matched_lines.items())

这是输出的内容：处理测试文件1.txt dict_items（[[''00001'，'Zeile 5 \ n'）]）

但是此“ Zeile 5 \ n”（正则表达式匹配为5 $）在它正在扫描的文本文件中多次出现。该文件看起来像这样：

Zeile 3
Zeile 4
Zeile 5

Zeile 1
Zeile 2
Zeile 3
Zeile 4
Zeile 5

Zeile 1
Zeile 2
Zeile 3
Zeile 4
Zeile 5

Zeile 1
Zeile 2
Zeile 3
Zeile 4
Zeile 5

Zeile 1
Zeile 2
Zeile 3

等

有什么想法吗？

Answer 1

您永远不会在第一次迭代后更新索引器，请看：

int_index = 1
indexer = str(int_index).zfill(5)

for line in alllines:
    if re.search(match_string,line,flags=0):
        matched_lines[indexer] = line # indexer was always the same!
        int_index += 1
        indexer = str(int_index).zfill(5) # this should fix it

Answer 2

在循环中，您更新int_index，而不更新indexer。因此，每次循环迭代都使用相同的indexer值，并覆盖dict中的相同条目，因此您只有一个保留的值。

For循环应将条目添加到字典中，但仅保留一个

2 个答案: