我想从巨大的文本文件中提取名词或名词组。 下面的python代码工作正常,但只提取最后一行的名词。 我很确定代码需要'追加'但不知道如何(我是python的初学者。)
import nltk
import pos_tag
import nltk.tokenize
import numpy
f = open(r'infile.txt', encoding="utf8")
data = f.readlines()
tagged_list = []
for line in data:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
nouns = [word for word,pos in tagged \
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]
downcased = [x.lower() for x in nouns]
joined = " ".join(downcased).encode('utf-8')
into_string = str(nouns)
output = open(r"outfile.csv", "wb")
output.write(joined)
output.close()
结果如下:公寓交通市中心,这是文件最后一行的名词。我想将文件的每一行的名词保存在一行中。例如,输入文件和相应的结果应如下所示。
Input file:
I like the milk.
I like the milk and bread.
I like the milk, bread, and butter.
Output file:
milk
milk bread
milk bread butter
希望有人帮助修复上面的代码。
答案 0 :(得分:2)
添加for循环的行尾,然后将其写入文件。
...
result = ""
for line in data:
...
result += joined
output = open(r"outfile.csv", "w")
output.write(str(result))
output.close()
如果你想使用追加:
...
result_list = []
for line in data:
...
result_list.append(joined)
output = open(r"outfile.csv", "w")
output.write(str(result_list))
output.close()
此外,如果您使用结果列表,则可以使用此书写方式:
...
output = open(r"outfile.csv", "w")
for item in result_list:
output.write(str(item) + "\n")
output.close()