在一个句子中查找短语

时间:2013-08-16 16:47:55

标签: python nltk

我必须制作一个必须在用句子指定关键字的句子中找到短语的程序:

The Large Hadron Collider (LHC) is the world’s largest and most powerfulparticle accelerator.This site includes the latest news from the project, accessible explanations of how the LHC works, how it is funded, who works there and what benefits it brings us.You can access a wide range of resources for the public, journalists and teachers and students, there are also many links to other sources of information.The Large Hadron Collider atCERNnear Geneva, Switzerland is opening new vistas on the deepest secrets of the universe, stretching the imagination with newly discovered forms of matter, forces of nature, and dimensions of space.

用户指定:

['large', 'big', 'heavy']

我不确定如何在变量中的关键字之前和之后拾取几个单词 例如:

keyword = 'large'

必须返回

The Large Hadron

句子中存在的大小。我怎么能在句子中的任何变量之前加一个单词并且在单词之后加一个单词?

3 个答案:

答案 0 :(得分:3)

test_word = 'large'
my_string = 'The Large Hadron Collider (LHC) is the world’s largest and most powerfulparticle accelerator.This site includes the latest news from the project, accessible explanations of how the LHC works, how it is funded, who works there and what benefits it brings us' 
# I truncated your sentence

test_words = my_string.lower().split()
correct_case = my_string.split() # this will preserve the case of the original words
# and it will be identical in length to test words with each word in the same position
position = test_words.index(test_word)

my_new_string = ' '.join(correct_case[position-1:position+2]

要清楚这两个列表具有相同的单词,test_words列表虽然将所有内容保持为小写,但您的test_word将在每个列表中的相同位置,因此您可以使用test_word列表中的位置来提取正确的单词来自correct_case列表。

答案 1 :(得分:0)

如何使用index获取关键字的位置,然后在关键字的任一侧将字符串切成一个字。

In [1]: s = 'The Large Hadron Collider (LHC) is the world’s largest and most powerfulparticle accelerator.'
In [2]: words = s.split() 
In [3]: words_lower = s.lower().split() #lowercase words so keyword matching is easy.
In [4]: keyword = 'large'
In [5]: i = words_lower.index(keyword)
In [6]: phrase = ' '.join(words[i-1:i+2])
In [7]: phrase
Out[7]: 'The Large Hadron'

答案 2 :(得分:0)

text = "The Large Hadron Collider (LHC) is the world’s largest and most powerfulparticle accelerator.This site includes the latest news from the project, accessible explanations of how the LHC works, how it is funded, who works there and what benefits it brings us.You can access a wide range of resources for the public, journalists and teachers and students, there are also many links to other sources of information.The Large Hadron Collider atCERNnear Geneva, Switzerland is opening new vistas on the deepest secrets of the universe, stretching the imagination with newly discovered forms of matter, forces of nature, and dimensions of space."
keywords = ['large', 'is', 'most']
text = text.lower().split(' ')
results = []
for word in keywords:
    indx = text.index(word)
    results.append(" ".join(text[indx-1:indx+2]))

print results