我正在做的是,从文本中删除除名词之外的所有词性。
我为此编写了一个函数。它可能不是最佳或优化的代码,因为我刚刚开始在python中编码。我确信这个bug必须非常基本,但我无法弄明白。
在我的函数中,两个输入作为参数。一个是硬盘上文本的位置,另一个是我们想要输出的文件位置。
以下是代码。
def extract_nouns(i_location, o_location):
import nltk
with open(i_location, "r") as myfile:
data = myfile.read().replace('\n', '')
tokens = nltk.word_tokenize(data)
tagged = nltk.pos_tag(tokens)
length = len(tagged)
a = list()
for i in range(0,length):
print(i)
log = (tagged[i][1][0] == 'N')
if log == False:
a.append(tagged[i][0])
fin = open(i_location, 'r')
fout = open(o_location, "w+")
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
with open(o_location, "r") as myfile_new:
data_out = myfile_new.read().replace('\n', '')
return data_out
当我调用此函数时它工作得很好。我正在按照我的预期在硬盘上获得输出,但它不会返回接口上的输出,或者我应该说,它返回一个空字符串而不是实际的输出字符串。
这就是我所说的。
t = extract_nouns("input.txt","output.txt")
如果您想尝试,请将以下内容作为输入文件的内容
"At eight o'clock on
Thursday film morning word line test
best beautiful Ram Aaron design"
当我调用函数时,这是我在输出文件(output.txt)中得到的输出,但该函数在接口上返回空字符串。它甚至不打印输出。
"
Thursday film morning word line test
Ram Aar design"
答案 0 :(得分:1)
您需要先关闭文件:
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
fout.close()
使用with
通常是打开文件的最佳方式,因为它会自动关闭它们并file.seek()
返回到要读取的文件的开头:
def extract_nouns(i_location, o_location):
import nltk
with open(i_location, "r") as myfile:
data = myfile.read().replace('\n', '')
tokens = nltk.word_tokenize(data)
tagged = nltk.pos_tag(tokens)
length = len(tagged)
a = []
for i in range(0,length):
print(i)
log = (tagged[i][1][0] == 'N')
if not log:
a.append(tagged[i][0])
with open(i_location, 'r') as fin, open(o_location, "w+") as fout:
for line in fin:
for word in a:
line = line.replace(word, "")
fout.write(line)
fout.seek(0) # go back to start of file
data_out = fout.read().replace('\n' , '')
return data_out
答案 1 :(得分:0)
函数中的最后一个语句应该是return
。
因为有print data_out
,所以返回print
的返回值,即无。
E.g:
In []: def test():
..: print 'Hello!'
..:
In []: res = test()
Hello!
In []: res is None
Out[]: True