Question

我想接受一句话：

sentence = "How many people are here"?

并返回一个短语列表：

pairs = ["How many", "many people", "people are", "are here"]

我试过

   tokens = nltk.word_tokenize(sentence)
   pairs = nltk.bigrams(tokens)

取而代之的是<generator object bigrams at 0x103697820>

我是nltk的新手很抱歉这是如此关闭:)帮助赞赏！

Answer 1

正如您所提到的，nktk.bigrams()函数返回一个生成器对象。需要迭代生成器以获取值。这可以使用list()完成，也可以循环生成器。

下面，我在列表推导中循环/迭代生成器对象（nktk.bigrams()的结果），同时使用"".join()组合单词（列表）单词，根据需要，由发电机脱落成一根绳子。

tokens = nltk.word_tokenize(sentence)
pairs = [ " ".join(pair) for pair in nltk.bigrams(tokens)]

['多少'，......]

Answer 2

这应该可以解决您的问题：

import re
f = open('D:\Jupyter notebook\SNPQ.txt','r')
text = f.read()
text = re.sub('^\n|\n$','',(text))
for no,line in enumerate(text.splitlines()):
    print('"'+'","'.join([i.replace('"','\\"').strip() for i in re.split('(?<=^[0-9]{2})([0-9]{13}| {13})|  +',text.splitlines()[no].strip()) if i != None])+'"')

谢谢：）

如何用NLTK从句子中获得单词对？

2 个答案: