我正在阅读包含文本的文件,然后将其传递以提取名词短语。名词短语正在打印,但是当我将其写入文本文件时,只有第一个短语被写入,或者什么也没有被写入。下面是我编写的用于打印到文本文件的代码。
下面的代码
import nltk
import re
file = open("C:\datafiles\entytest.txt", "r")
doclist = [ line for line in file ]
docstr = '' . join(doclist)
sentences = re.split(r'[.!?]', docstr)
grammar = '\n'.join([
'NP: {<DT>*<NN>*<NN>}',
])
for sentence in sentences:
words = nltk.word_tokenize(sentence)
tags = nltk.pos_tag(words)
chunkparser = nltk.RegexpParser(grammar)
nnphrs = chunkparser.parse(tags)
print(nnphrs)
f = open("C:\datafiles\nphrs.txt", "w")
for sentence in sentences:
f.write("'%s',\n" %nnphrs)
f.close()
答案 0 :(得分:0)
如果您希望将单词放入txt文件中,则应将其存储在循环中,如下所示:
f = open("C:\datafiles\nphrs.txt", "w")
for sentence in sentences:
words = nltk.word_tokenize(sentence)
tags = nltk.pos_tag(words)
chunkparser = nltk.RegexpParser(grammar)
nnphrs = chunkparser.parse(tags)
f.write("'%s',\n" %nnphrs)
print(nnphrs)
f.close()
答案 1 :(得分:0)
正如Khelwood所说,由于缩进效果较差,您只解析了一个句子。
与许多其他语言不同,Python是根据行间距执行的。当块比循环更缩进时,它们就是循环和构造的一部分。
您可以详细了解here。
f = open("C:\datafiles\nphrs.txt", "w")
for sentence in sentences:
words = nltk.word_tokenize(sentence)
tags = nltk.pos_tag(words)
chunkparser = nltk.RegexpParser(grammar)
nnphrs = chunkparser.parse(tags)
print(nnphrs)
f.write("'%s',\n" %nnphrs)
f.close()
答案 2 :(得分:0)
我将使用print来写入文件:
with open("C:\datafiles\nphrs.txt", "w") as f:
for sentence in sentences:
words = nltk.word_tokenize(sentence)
tags = nltk.pos_tag(words)
chunkparser = nltk.RegexpParser(grammar)
nnphrs = chunkparser.parse(tags)
print(nnphrs,file=f)