我正在使用一个停用词过滤器 我向脚本提供了包含文章的文件的路径。 但是我得到了错误:
Traceback (most recent call last):
File "stop2.py", line 17, in <module>
print preprocess(sentence)
File "stop2.py", line 10, in preprocess
sentence = sentence.lower()
AttributeError: 'file' object has no attribute 'lower'
我的代码也附在下面 关于如何将文件作为参数传递的任何想法
# -*- coding: utf-8 -*-
from __future__ import division, unicode_literals
import string
import nltk
from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
import re
def preprocess(sentence):
sentence = sentence.lower()
tokenizer = RegexpTokenizer(r'\w')
tokens = tokenizer.tokenize(sentence)
filtered_words = [w for w in tokens if not w in stopwords.words('english')]
return " ".join(filtered_words)
sentence = open('pathtofile')
print preprocess(sentence)
答案 0 :(得分:2)
sentence = open(...)
表示该句子是file
的实例(从open()
方法返回);
虽然您似乎想拥有该文件的全部内容:sentence = open(...).read()