Question

我有一个3grams的元组，如下所示：

from nltk import ngrams
test_data = ["this is all test data", "this not"]

three_gram_list = []
for data in test_data:
 three_grams = ngrams(data.split(" "), 3)
 for gram in three_grams:
  three_gram_list.append(gram)

我想要做的是创建一个函数，检查每个3-gram是否在同一元组中使用了单词。因此我做了以下事情：

def create_specific_trigram(three_grams, parameters1, parameters2):

 condition1 = False
 condition2 = False

 for three in three_grams:
     for num in range(1, 3):
         if three[num] in parameters1:
            condition1 = True

      for num in range(1, 3):
          if three[num] in parameters2:
              condition2 = True

      if condition1 and condition2:
          print(three)

但我现在用一些参数运行它：

parameters1 = ("test", "testing")
parameters2 = ("data", "datas")

for sentence in test_data:
  create_specific_trigram(three_grams, paramaters1, parameters2)

我得到以下输出。

('all', 'test', 'data')
('all', 'test', 'data')

但是我每个句子只找一个输出。所以在这种情况下：

('all', 'test', 'data')

有关我应该应用哪些更改的想法？

Answer 1

启动功能three_grams时，您可以使用sentence的相同值启动它，与test_data = ["this is all test data", "this not"] parameters1 = ("test", "testing") parameters2 = ("data", "datas") #============================================ #implementation of create_specific_trigram # ... #============================================ for sentence in test_data: three_grams = ngrams(sentence.split(" "), 3) create_specific_trigram(three_grams, paramaters1, parameters2)无关。

试试这个：

/<(?!\s*br\s*\/?)[^>]+>/gi

查找包含特定关键字的元组

1 个答案: