Question

我正在编写一个函数，该函数将截取一段文本，将该文本拆分为多个句子，然后在每个句子中搜索彼此之间一定距离内的两个单词。此功能还可以区分“当前”结构和“未来”结构，它们以否定和肯定的方式捕获。

在Regex101中，正则表达式工作正常，但在此Python函数中，它不返回任何匹配项。

我尝试逐步调试该函数，似乎没有输入问题。从PHP到Python Regex，我已经进行了所有必要的更改，所以我也不认为这是问题所在。

这是整个功能：

def scope_search(text, word_list1, word_list2, tense, prox=25):

    import regex
    # split the document into sentences, but ignore decimal points in the middle of the sentence
    sentences = [each.lower().lstrip() for each in regex.split('(?!.*\d)\.', text) if each]

    # if tense is 'present', the regex pattern will include negative lookahead for future language in the sentence
    if tense == 'present':
        pattern = '^(?!(hope|expect|will|going to|in the future|plan(ning)? on|anticipate?(ing)?|foresee(ing)?|forecasts?))(\\b({0})\\b.{{0,{1}}}\\b({2})\\b)|(\\b({2})\\b.{{0,{1}}}\\b({0})\\b)$'.format(word_list1,prox,word_list2)
    # if the tense is 'future', the regex pattern will include positive lookahead for future language
    elif tense == 'future':
        pattern = '^(?=(hope|expect|will|going to|in the future|plan(ning)? on|anticipate?(ing)?|foresee(ing)?|forecasts?))(\\b({0})\\b.{{0,{1}}}\\b({2})\\b)|(\\b({2})\\b.{{0,{1}}}\\b({0})\\b)$'.format(word_list1,prox,word_list2)

    matches = []
    # search sentence by sentence for all relevant matches
    for sentence in sentences:
        matches.append(regex.findall(pattern, sentence))
    matches = [each for each in list(matches[0]) if each]

    return matches

我认为可能是字符串格式问题，但是在此功能之外也可以正常工作。

以下是我要搜索的两个单词列表：

word_list1 = (increase?|double?|high(er)?|strong|strength|grow|growth|grew|(go|goes|went|going) up)(s|ing)?

word_list2 = (AUM|FUM|assets under management|funds under management|shipments|basis points|earnings|sales|revenues?|deposits|orders?|(new )?participants?( counts?)?)

同样，所有这些在Regex101中都可以正常工作，因此我认为Regex本身不是问题。非常感谢您的帮助。

Regex可在Regex101中使用，但在Python中不会返回任何匹配项吗？

0 个答案: