Question

我很擅长使用nltk，并且卡住了。我想将一个文本文件拆分成单独的句子，并将每个句子设置为一个变量供以后使用。我有第一部分照顾：

import nltk
from nltk.tokenize import sent_tokenize

text1 = open('/Users/joshuablew/Documents/myCorpus/version1.txt').read()

sent_tokenize(text1)

这打印出每个分开的句子：

['Who was the 44th president of the United States?', 'Where does he live?', 'This is just a plain sentence.', 'As well as this one, just to break up the questions.', 'How many houses make up the United States Congress?', 'What are they called?', 'Again, another question breakpoint here.', 'Who is our current President?', 'Can he run for re-election?', 'Why or why not?']

从这里开始，我不知道该怎么做才能将这些句子自动保存到变量中。

或者，是否可以使用索引text1[0] = 'Who was the 44th president of the United States?'和text1[1] = 'Where does he live?'等等？文本文件的每个索引都是每个单独的句子

感谢您的帮助。

Answer 1

import nltk
from nltk.tokenize import sent_tokenize

with open('1.txt', 'r') as myfile:
    sentences=myfile.read()

number_of_sentences = sent_tokenize(sentences)

print(len(number_of_sentences))

textList = sent_tokenize(sentences)

print(textList)

如何将句子设置为变量NLTK

1 个答案: