Azure ML中的NLTK

时间:2016-09-26 15:30:10

标签: python nltk azure-machine-learning-studio

民间,

我有以下代码在nltk中创建pos标记器,实现为"执行Python脚本"在Azure ML中。问题是脚本每次都必须下载 maxent_treebank_pos_tagger 。注释该行将引发以下错误。我甚至尝试使用nltk.download(' all')进行下载,但仍然没有帮助。

"C:\\pyhome\\lib\\site-packages\\nltk\\data.py\", line 467, in find\r\n raise LookupError(resource_not_found)\r\nLookupError: \r\n**********************************************************************\r\n Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not\r\n found. Please use the NLTK Downloader to obtain the resource:\r\n >>> nltk.download()\r\n Searched in:\r\n - 'C:\\\\Users\\\\Client/nltk_data'\r\n - 'C:\\\\nltk_data'\r\n - 'D:\\\\nltk_data'\r\n - 'E:\\\\nltk_data'\r\n - 'C:\\\\pyhome\\\\nltk_data'\r\n - 'C:\\\\pyhome\\\\lib\\\\nltk_data'\r\n - 'C:\\\\Users\\\\Client\\\\AppData\\\\Roaming\\\\nltk_data'\r\n**********************************************************************\r\nProcess returned with non-zero exit code 1\r\n\r\n---------- End of error message from Python interpreter ----------"}}Error: Error 0085: The following error occurred during script evaluation, please view the output log for more information:---------- Start of error message from Python interpreter ----------Caught exception while executing function: Traceback (most recent call last): File "C:\server\invokepy.py", line 199, in batch odfs = mod.azureml_main(*idfs) File "C:\temp\febff15ac9584d978d04d40f0c7bd565.py", line 32, in azureml_main tagged = nltk.pos_tag(tokens) File "C:\pyhome\lib\site-packages\nltk\tag\__init__.py", line 99, in pos_tag tagger = load(_POS_TAGGER) File "C:\pyhome\lib\site-packages\nltk\data.py", line 605, in load resource_val = pickle.load(_open(resource_url)) File "C:\pyhome\lib\site-packages\nltk\data.py", line 686, in _open return find(path).open() File "C:\pyhome\lib\site-packages\nltk\data.py", line 467, in find raise LookupError(resource_not_found)LookupError: ********************************************************************** Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - 'C:\\Users\\Client/nltk_data' - 'C:\\nltk_data' - 'D:\\nltk_data' - 'E:\\nltk_data' - 'C:\\pyhome\\nltk_data' - 'C:\\pyhome\\lib\\nltk_data' - 'C:\\Users\\Client\\AppData\\Roaming\\nltk_data'**********************************************************************Process returned with non-zero exit code 1---------- End of error message from Python interpreter ---------- Process exited with error code -2

以下是我在Azure ml中的代码

def azureml_main(dataframe1 = None, dataframe2 = None):
# import required packages
import pandas as pd
import nltk
import numpy as np
# tokenize the review text and store the word corpus
word_dict = {}
token_list = []
#nltk.download('all')
#nltk.download(info_or_id='punkt', download_dir='C:/users/client/nltk_data')
#nltk.download(info_or_id='maxent_treebank_pos_tagger', download_dir='C:/users/client/nltk_data')
for text in dataframe1["tweet_text"]:
    tokens = nltk.word_tokenize(text.decode('utf8'))
    tagged = nltk.pos_tag(tokens)


  # convert feature vector to dataframe object
dataframe_output = pd.DataFrame(tagged, columns=['Word', 'Type'])
return [dataframe_output]

0 个答案:

没有答案