Question

是否有一种方法（模式或Python或NLTK等）来检测一个句子中有一个单词列表。

即。

The cat ran into the hat, box, and house. | The list would be hat, box, and house

这可以是字符串处理但我们可能有更多通用列表：

即。

The cat likes to run outside, run inside, or jump up the stairs. |

List=run outside, run inside, or jump up the stairs.

这可能位于段落的中间或句子的末尾，这使事情变得更加复杂。

我一直在使用Pattern for python一段时间而且我没有办法解决这个问题，如果有一种方式使用pattern或nltk（自然语言工具包），我很好奇。

Answer 1

根据我的问题，我认为您想要搜索列表中的所有单词是否都出现在句子中。

一般来说，要搜索列表元素，您可以使用all函数。如果其中的所有参数都为真，则返回true。

listOfWords = ['word1', 'word2', 'word3', 'two words']
sentence = "word1 as word2 a fword3 af two words"

if all(word in sentence for word in listOfWords):
    print "All words in sentence"
else:
    print "Missing"

输出： -

"All words in sentence"

我认为这可能符合您的目的。如果没有，那么你可以澄清。

Answer 2

使用from nltk.tokenize import sent_tokenize怎么样？

sent_tokenize("Hello SF Python. This is NLTK.")
["Hello SF Python.", "This is NLTK."]

然后您可以这样使用该句子列表：

for sentence in my_list:
  # test if this sentence contains the words you want
  # using all() method

更多信息here

Answer 3

all(word in sentence for word in listOfWords)

Answer 4

使用Trie，您将可以实现为O(n)，其中n是在用单词列表构建trie之后的单词列表中的单词数量，取O(n)，其中n是列表中的单词数。

算法

将句子分成由空格分隔的单词列表。
对于每个单词，请检查单词中是否有键。即该单词存在于列表中
- 如果退出，则将该单词添加到结果中，以跟踪列表中有多少单词出现在句子中
- 跟踪具有subtrie（即当前单词）的单词的单词是单词列表中较长单词的前缀
  - 对于此单词中的每个单词，可以通过将其扩展为当前单词来查看，它可以是单词列表中的键或子列表
- 如果这是子类别，则将其添加到extend_words列表中，看看是否可以与下一个单词并置，我们就能获得完全匹配。

代码

import pygtrie
listOfWords = ['word1', 'word2', 'word3', 'two words']

trie = pygtrie.StringTrie()
trie._separator = ' '
for word in listOfWords:
  trie[word] = True

print('s', trie._separator)

sentence = "word1 as word2 a fword3 af two words"
sentence_words = sentence.split()
words_found = {}
extended_words = set()

for possible_word in sentence_words:
  has_possible_word = trie.has_node(possible_word)

  if has_possible_word & trie.HAS_VALUE:
    words_found[possible_word] = True

  deep_clone = set(extended_words)
  for extended_word in deep_clone:
    extended_words.remove(extended_word)

    possible_extended_word = extended_word + trie._separator + possible_word
    print(possible_extended_word)
    has_possible_extended_word = trie.has_node(possible_extended_word)

    if has_possible_extended_word & trie.HAS_VALUE:
      words_found[possible_extended_word] = True

    if has_possible_extended_word & trie.HAS_SUBTRIE:
      extended_words.update(possible_extended_word)


  if has_possible_word & trie.HAS_SUBTRIE:
    extended_words.update([possible_word])

print(words_found)
print(len(words_found) == len(listOfWords))

如果您的单词列表很大并且您不想每次都对其进行迭代，或者您对同一单词列表有大量查询，这很有用。

The code is here

确定一个单词列表是否在句子中？

4 个答案: