Question

我正在尝试编写一个Python 3.6.0脚本来查找页面中的元素。它以2种格式出现的单词之后提取行：“元素：”或“元素：”（在“：”之前有一个空格）。

所以我尝试使用正则表达式。它的工作时间只有一半，我无法弄清楚代码中出了什么问题。以下是带有示例的代码：

import re

TestString = r"""Some text
Year: 2015.12.10
Some other text
"""

ListOfTags = ["Year(?= ?):", "Year(?=\s?):", "Year(?= *):"]

for i in range(0, len(ListOfTags)):
    try:
        TagsFound = re.search(str(ListOfTags[i]) + '(.+?)\n', TestString).group(1)
        print(TransformString('"' + ListOfTags[i] + '"') + " returns: " + TagsFound)
    except AttributeError:
            # TestString not found in the original string (or something else ???)
            TagsFound = ''
            print("No tag found..")

（使用此代码，我可以一次测试几个表达式）

这里，当表达式为“Year：2015.12.10”时，所有正则表达式都起作用并返回“2015.12.10”

但是，当它们是“年份：”时它们不起作用（在“：”之前有一个空格）......

我还尝试过表达“年（？）：”，“年（\ s？）：”，“年（*）：”，“年（|：？）（|：？）”，但他们做了不行。

Answer 1

我认为正则表达式在这里可能有点过分（除非你有充分的理由使用它们）。您可以尝试逐行处理文本。对于每一行，您可以使用str上的分区方法将其拆分为找到的第一个冒号。

for line in TestString.splitlines():
    if ':' in line:
        tag, __, value = line.partition(':')
        #Now see if this is a tag you care about and do something with the value

工作一半的时间来替换1个空间或什么也没有

1 个答案: