Question

所以我在列表中有些句子：

some_list = ['Joe is travelling via train.' 
             'Joe waited for the train, but the train was late.'
             'Even after an hour, there was no sign of the 
              train. Joe then went to talk to station master about the 
              train's situation.']

然后我使用了nltk的Sentence标记器，因为我想分别分析完整句子中的每个句子。因此，现在O / P在列表列表格式中看起来像这样：

sent_tokenize_list = [['Joe is travelling via train.'],
                      ['Joe waited for the train,',
                       'but the train was late.'],
                      ['Even after an hour,',
                       'there was no sign of the 
                        train.',
                       'Joe then went to talk to station master about 
                        the train's situation.']]

现在，从该列表列表中，我如何仅选择具有多于1个句子的列表，即在我的示例中为第二和第三列表，并且仅以列表格式作为单独列表

。

即O / P应该是

['Joe waited for the train,','but the train was late.'] 
['Even after an hour,','there was no sign of the train.',
 'Joe then went to talk to station master about the train's situation.']

Answer 1

您可以使用len检查列表中的句子数。

例如：

sent_tokenize_list = [['Joe is travelling via train.'],
                      ['Joe waited for the train,',
                       'but the train was late.'],
                      ['Even after an hour,','there was no sign of the train.',"Joe then went to talk to station master about the train's situation."]]


print([i for i in sent_tokenize_list if len(i) >= 2])

输出：

[['Joe waited for the train,', 'but the train was late.'], ['Even after an hour,', 'there was no sign of the train.', "Joe then went to talk to station master about the train's situation."]]

使用Sentence Tokenizer后从列表列表中选择一个子列表

1 个答案: