Question

我有英语句子列表（每个句子都是一个列表），我想获取ngram。例如：

sentences = [['this', 'is', 'sentence', 'one'], ['hello','again']]

为了运行

nltk.utils.ngram

我需要将列表平整为：

sentences = ['this','is','sentence','one','hello','again']

但是随后我在

中遇到了错误的bgram。

（'一个'，'你好'）

。最好的解决方法是什么？

谢谢！

Answer 1

尝试一下：

from itertools import chain

sentences = list(chain(*sentences))

chain返回一个链对象，该对象的.__next__()方法从第一个可迭代对象返回元素，直到穷尽为止，然后从下一个迭代对象返回元素直到所有可迭代项都用尽为止。

或者您可以这样做：

 sentences = [i for s in sentences for i in s]

Answer 2

您还可以使用列表理解

f = []
[f.extend(_l) for _l in sentences]

f = ['this', 'is', 'sentence', 'one', 'hello', 'again']