Question

我正在尝试生成给定句子的二元组列表，例如，如果我键入，

    To be or not to be

我希望程序生成

     to be, be or, or not, not to, to be

我尝试了以下代码，但只是给了我

<generator object bigrams at 0x0000000009231360>

这是我的代码：

    import nltk
    bigrm = nltk.bigrams(text)
    print(bigrm)

那我怎么得到我想要的东西？我想要一个上面的单词组合列表（是，是或否，不是）。

Answer 1

nltk.bigrams()返回一个bigrams的迭代器（特别是一个生成器）。如果需要列表，请将迭代器传递给list()。它还需要一系列项目来生成bigrams，因此你必须在传递之前拆分文本（如果你还没有这样做）：

bigrm = list(nltk.bigrams(text.split()))

要用逗号分隔打印出来，你可以（在python 3中）：

print(*map(' '.join, bigrm), sep=', ')

如果在python 2上，那么例如：

print ', '.join(' '.join((a, b)) for a, b in bigrm)

请注意，只是为了打印，您不需要生成列表，只需使用迭代器。

Answer 2

以下代码为给定的句子生成bigram列表

>>> import nltk
>>> from nltk.tokenize import word_tokenize
>>> text = "to be or not to be"
>>> tokens = nltk.word_tokenize(text)
>>> bigrm = nltk.bigrams(tokens)
>>> print(*map(' '.join, bigrm), sep=', ')
to be, be or, or not, not to, to be

使用NLTK生成双字母组

2 个答案: