查找包含特定关键字的元组

时间:2017-03-30 11:44:19

标签: python

我有一个3grams的元组,如下所示:

from nltk import ngrams
test_data = ["this is all test data", "this not"]

three_gram_list = []
for data in test_data:
 three_grams = ngrams(data.split(" "), 3)
 for gram in three_grams:
  three_gram_list.append(gram)

我想要做的是创建一个函数,检查每个3-gram是否在同一元组中使用了单词。因此我做了以下事情:

def create_specific_trigram(three_grams, parameters1, parameters2):

 condition1 = False
 condition2 = False

 for three in three_grams:
     for num in range(1, 3):
         if three[num] in parameters1:
            condition1 = True

      for num in range(1, 3):
          if three[num] in parameters2:
              condition2 = True

      if condition1 and condition2:
          print(three)

但我现在用一些参数运行它:

parameters1 = ("test", "testing")
parameters2 = ("data", "datas")

for sentence in test_data:
  create_specific_trigram(three_grams, paramaters1, parameters2)

我得到以下输出。

('all', 'test', 'data')
('all', 'test', 'data')    

但是我每个句子只找一个输出。所以在这种情况下:

('all', 'test', 'data')

有关我应该应用哪些更改的想法?

1 个答案:

答案 0 :(得分:1)

启动功能three_grams时,您可以使用sentence的相同值启动它,与test_data = ["this is all test data", "this not"] parameters1 = ("test", "testing") parameters2 = ("data", "datas") #============================================ #implementation of create_specific_trigram # ... #============================================ for sentence in test_data: three_grams = ngrams(sentence.split(" "), 3) create_specific_trigram(three_grams, paramaters1, parameters2) 无关。

试试这个:

/<(?!\s*br\s*\/?)[^>]+>/gi