我对这条线路有疑问:
processed = process(cleaned, lemmatizer=nltk.stem.wordnet.WordNetLemmatizer());
为什么会弹出意外的关键字参数?
Error: TypeError: process() got an unexpected keyword argument 'lemmatizer'
这是我的代码:
def process(text, filters=nltk.corpus.stopwords.words('english')):
""" Normalizes case and handles punctuation
Inputs:
text: str: raw text
lemmatizer: an instance of a class implementing the lemmatize() method
(the default argument is of type nltk.stem.wordnet.WordNetLemmatizer)
Outputs:
list(str): tokenized text
"""
lemmatizer=nltk.stem.wordnet.WordNetLemmatizer()
word_list = nltk.word_tokenize(text);
lemma_list = [];
for i in word_list:
if i not in filters:
try:
lemma = lemmatizer.lemmatize(i);
lemma_list.append(str(lemma));
except:
pass
return " ".join(lemma_list)
if __name__ == '__main__':
#construct filter for processor
file = open("accountant.txt").read().lower()
filters = set(nltk.word_tokenize(file))
filters.update(nltk.corpus.stopwords.words('english'))
filters = list(filters)
#webcrawling
webContent = []
dataJobs = pd.read_csv("test.csv");
webContent = []
for i in dataJobs["url"]:
content = webCrawl(i);
webContent.append(content);
#clean the crawled text
cleaned_list = []
for j in webContent:
cleaned = extractUseful(j);
processed = process(cleaned, lemmatizer=nltk.stem.wordnet.WordNetLemmatizer());
cleaned_list.append(processed)
#save to csv
contents = pd.DataFrame({ "Content":webContent, "Cleaned": cleaned_list})
contents.to_csv("testwebcrawled.csv")
dataJobs[['jd']]= cleaned_list
dataJobs.to_csv("test_v2_crawled.csv")
答案 0 :(得分:0)
您只能在filters
(process
行)的函数签名中定义一个关键字参数def process(...)
。如果你想要通过过滤器尝试使用过滤器,请尝试:
processed = process(cleaned, filter=nltk.stem.wordnet.WordNetLemmatizer())
如果你也希望能够传递一个词形变换器,你应该将你的功能签名更改为:
def process(text,
filters=nltk.corpus.stopwords.words('english'),
lemmatizer=nltk.stem.wordnet.WordNetLemmatizer()):
但是请注意,如果希望将=
之后的值作为这些参数的默认参数传递,则只需要=
及其函数签名后面的内容。否则,您可以这样做:
def process(text, filter, lemmatizer):
...
并称之为:
processed = process(cleaned,
filter=nltk.corpus.stopwords.words('english'),
lemmatizer=nltk.stem.wordnet.WordNetLemmatizer())