我发现此python代码可对文本文件执行词干搜索。
import nltk
import string
from collections import Counter
def get_tokens():
with open('/Users/MYUSERNAME/Desktop/Test_sp500/A_09.txt', 'r') as shakes:
text = shakes.read()
lowers = text.lower()
no_punctuation = lowers.translate(None,string.punctuation)
tokens = nltk.word_tokenize(no_punctuation)
return tokens
tokens = get_tokens()
count = Counter(tokens)
print
count.most_common(10)
from nltk.corpus import stopwords
tokens = get_tokens()
filtered = [w for w in tokens if not w in stopwords.words('english')]
count = Counter(filtered)
print
count.most_common(100)
from nltk.stem.porter import *
def stem_tokens(tokens, stemmer):
stemmed = []
for item in tokens:
stemmed.append(stemmer.stem(item))
return stemmed
stemmer = PorterStemmer()
stemmed = stem_tokens(filtered, stemmer)
count = Counter(stemmed)
print
count.most_common(100)
当我尝试运行该程序时,出现以下错误:
Traceback (most recent call last):
File "/Users/MYUSERNAME/Desktop/stemmer.py", line 15, in <module>
tokens = get_tokens()
File "/Users/MYUSERNAME/Desktop/stemmer.py", line 10, in get_tokens
no_punctuation = lowers.translate(None,string.punctuation)
TypeError: translate() takes exactly one argument (2 given)
现在我的问题是:
注意:我通常不必编程,因此我只知道绝对的Python基础知识。
答案 0 :(得分:0)
我认为您使用的是Python版本> = 3。
在Python 2.7中,功能translate
take 2 arguments在Python 3及更高版本中takes only 1 argument。从本质上讲,这就是为什么您会遇到错误。
我不确定您要使用None
参数做什么,因为在Python 2.7中它毫无意义,您基本上是在尝试将string.punctuation
转换为{ {1}}。
相反,您需要make a translation table,然后将其传递给翻译函数。
None