我正在尝试使用spacy匹配包含(两者)两个关键词('fox'和'dog')的文档中的句子。我的问题是,我不想总是指定单词的顺序,也不希望在感兴趣的单词之间对单词进行编号。我只想匹配两个都存在的句子。 可以这样做吗?
例如,有可能编写一条既可以匹配句子1也可以匹配句子2,而不匹配句子3或句子4的规则。
sentence1 = 'The quick brown fox jumps over the lazy dog.'
sentence2 = 'The quick fox is brown and jumps over the lazy dog.'
sentence3 = 'There is a fox in my back garden'
sentence4 = 'There is a dog in my back garden'
A typical spacy matching rule looks like this:
pattern = [{"LEMMA": "dog"}, {"LEMMA": "fox"}]
显然,在我的情况下这是行不通的,因为spacy希望出现狗和狐狸彼此相邻的情况。
答案 0 :(得分:0)
您是否尝试过拆分句子,然后检查单词?在您的示例中,
list1 = sentence1.split()
list2 = sentence2.split()
count = 0
for word in list1:
if word in list2:
count += 1
print('Match words =', count)
如果您只想对单词计数一次,并且不想将大写字母考虑在内,那么
set1 = set(sentence1.lower().split())
set2 = set(sentence2.lower().split())
count = 0
for word in set1:
if word in set2:
count += 1
print('Match words =', count)
答案 1 :(得分:0)
与此代码一起使用:
{"OP":"|"}