Question

我正在尝试获得单词技能的第二场比赛的索引。我想匹配单独存在而不是句子中的关键字。

keyword = "skills"

def get_match_index(keyword, text):
    for sentence in text.split('\n'):
        if keyword == sentence.lower().strip():
            print(re.search(keyword,text))

这将返回第一次搜索的索引。这是文字。

Assessed and changed skills required to take company to next level in the IT, HR, Accounting.
-
College Station

Skills

我想在这里匹配关键字的第二个实例 - ＆＃34;技能＆＃34;，标题，而不是句子。

Answer 1

您可以使用findall

而不是search

keyword = "skills"

def get_match_index(keyword, text):
    for sentence in text.split('\n'):
        if keyword == sentence.lower().strip():
            print(re.findall(keyword,text))

文档说：

re.search（pattern，string，flags = 0）re.searScan通过字符串查找正则表达式模式产生匹配的第一个位置，并返回相应的MatchObject实例。

和

re.findall（pattern，string，flags = 0）返回字符串中pattern的所有非重叠匹配，作为字符串列表。

Answer 2

以另一种方式解决您的问题，您可以寻找大写Skill代替：

def get_match_index(keyword, text):
    start_match = text.index(keyword)
    end_match = start_match + len(text)
    return start_match, end_match

，这个形式的返回与此正则表达式的span()调用相同：

def get_match_index(keyword, text):
    pattern = re.compile(f"(?<=\n){keyword}")
    return pattern.search(text.lower()).span()

Answer 3

终于得到了理想的结果。感谢@mrzasa建议使用finditer方法。谢谢@Arne，你得到大写的匹配。

pattern = r'(?i)^skills$'
regex = re.compile(pattern, re.IGNORECASE)

match_tup = [match.span() for match in re.finditer(r'(?i)^skills$',text,re.MULTILINE)]
print(match_tup)

无法获取第二场比赛的索引

3 个答案: