Question

我正在尝试使用spacy匹配包含（两者）两个关键词（'fox'和'dog'）的文档中的句子。我的问题是，我不想总是指定单词的顺序，也不希望在感兴趣的单词之间对单词进行编号。我只想匹配两个都存在的句子。可以这样做吗？

例如，有可能编写一条既可以匹配句子1也可以匹配句子2，而不匹配句子3或句子4的规则。

sentence1 = 'The quick brown fox jumps over the lazy dog.'

sentence2 = 'The quick fox is brown and jumps over the lazy dog.'

sentence3 = 'There is a fox in my back garden'

sentence4 = 'There is a dog in my back garden'

A typical spacy matching rule looks like this:

pattern = [{"LEMMA": "dog"}, {"LEMMA": "fox"}]

显然，在我的情况下这是行不通的，因为spacy希望出现狗和狐狸彼此相邻的情况。

Answer 1

您是否尝试过拆分句子，然后检查单词？在您的示例中，

list1 = sentence1.split()
list2 = sentence2.split()

count = 0
for word in list1:
    if word in list2:
        count += 1

print('Match words =', count)

如果您只想对单词计数一次，并且不想将大写字母考虑在内，那么

set1 = set(sentence1.lower().split())
set2 = set(sentence2.lower().split())

count = 0
for word in set1:
    if word in set2:
        count += 1

print('Match words =', count)

Answer 2

与此代码一起使用：

{"OP":"|"}

是否可以使用基于散乱规则的匹配而不定义关键字的顺序或关键字之间的单词数？

2 个答案: