Question

from nltk.corpus import stopwords
print "starting to read \n"

fw=open('cde.txt','w');

with open('test.txt') as fp:
    for line in fp:
                fw.write('\n')
                fw.write(line)
fp.close()
fw.close()

print "\ndone with writing \n"

print "starting to print from another file \n"

with open('cde.txt','r+') as ss:
    for line in ss:
        for word in line.split():
                if word in stopwords.words('english'):
                        #ss.write(line.remove(word))
                        ss.remove(word)

 #print line.rstrip()
ss.close()

#for word in line.split():

print "done with printing from another file"

我正在运行此脚本但仍在继续

AttributeError: 'file' object has no attribute 'remove'

错误。

Answer 1

由于问题中缺少错误的确切痕迹，我猜测失败是由于对ss.remove()的调用造成的。从此代码中ss似乎是一个文件句柄，并且（如错误所示）文件对象不支持remove()方法。

如果要删除该文件，可以使用os.remove(filepath)，但此代码似乎没有这样做。现在代码正在尝试从文件中删除该单词（这不是像这样的支持操作）。

如果您要从文件中删除单词，一种简单的方法是开始创建仅包含所需信息的另一个文件（如临时文件），并在处理完成后替换旧文件使用这个新生成的文件（并可能在最后删除临时文件）。

如果您想从数据中排除stopwords，您可以将数据保留在列表中，如下所示：

with open('cde.txt.cleared', 'w+') as output:
    with open('cde.ext', 'r+') as ss:
        for line in ss:
            words = line.strip().split()
            for word in words:
                if word in stopwords.words('english'):
                    words.remove(word)
            output.write(' '.join(words) + '\n')

请注意，我们在写入模式下打开了输出文件。另请注意，此代码不会保留单词之间的空格数，因为它会将行转换为列表，然后再从这些单词构造行。如果这是一个问题，我认为您可能需要处理字符串而不是将它们拆分为列表。

Answer 2

我猜OP想要从文件中删除停用词。为此，请尝试：

for line in ss:
    parts = line.split()
    for word in xrange(len(parts)):
        if parts[word] in in stopwords.words('english'):
            parts.remove(parts[word])

    ss.write(' '.join(parts))

我希望这会让你感到厌烦。如果没有，请留下更详细的评论。

Answer 3

此代码段正在从test.txt文件中读取文本，并将相同的文本写入＆＃39; cde.txt＆＃39;删除停用词后的文件。这可能会对你有所帮助。

linetext=[]
for line in ss:
    line1=[]
    for word in line.split():
        if word not in stopwords.words('english'):
            line1.append(word)

    linetext.append(" ".join(line1))
    linetext.append('\n')
with open('cde.txt','wb') as fw:
    fw.writelines(linetext)

AttributeError：'file'对象没有属性'remove'

3 个答案: