Question

我想获取输出以确定某个单词附近的3个单词。在此示例中，该单词将在“ to”周围从左侧返回3个单词，从右侧返回3个单词。

import re 
sentence="#allows us to be free from the place"

key= "to"

left=[]
right=[]
m = re.search(r'((?:\w+\W+){,3})'+key+'\W+((?:\w+\W+){,3})',sentence)

if m:
    l = [ x.strip().split() for x in m.groups()]

    #l= two arrays of left and right
left, right = l[0], l[1]
print left, right

输出：

['allows', 'us'] ['be', 'free', 'from']

从输出中可以看到，不包含'＃'符号。预期输出：

['#allows', 'us'] ['be', 'free', 'from']

注意：由于“ to”周围最多只能有2个单词，尽管正则表达式是3个单词，但它将返回两个单词

在某些情况下，密钥可能超过一个单词

似乎是什么问题，如何解决？谢谢

Answer 1

无需使用正则表达式执行此操作。您可以使用list slice。

sentence = '#allows us to be free from the place'
search_word = 'to'
context = 3

words = sentence.split()

try:
    word_index = words.index(search_word)
    start = max(0, word_index - context)
    stop = min(word_index + 1 + context, len(words))
    context_words = words[start:stop]
    print(context_words)
except ValueError:
    print('search_word not in the sentence')

打印

['#allows', 'us', 'to', 'be', 'free', 'from']

如果您想要单独的“之前”和“之后”列表，请使用两个切片。

在某些单词周围的单词正则表达式中包含“＃”

1 个答案: