Azure ML实验提供了通过Reader
和Writer
模块将CSV文件读取和写入Azure blob存储的方法。但是,我需要将一个JSON文件写入blob存储。由于没有模块可以这样做,我试图在Execute Python Script
模块中这样做。
# Import the necessary items
from azure.storage.blob import BlobService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='mykeyhere=='
json_string='{jsonstring here}'
blob_service = BlobService(account_name, account_key)
blob_service.put_block_blob_from_text("upload","out.json",json_string)
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
但是,这会导致错误:ImportError: No module named azure.storage.blob
这意味着Azure ML上未安装azure-storage
Python包。
如何从Azure ML实验中写入Azure blob存储?
这里是填充错误消息:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 162, in batch
mod = import_module(moduleName)
File "C:\pyhome\lib\importlib\__init__.py", line 37, in import_module
__import__(name)
File "C:\temp\azuremod.py", line 19, in <module>
from azure.storage.blob import BlobService
ImportError: No module named azure.storage.blob
---------- End of error message from Python interpreter ----------
Start time: UTC 02/06/2016 17:59:47
End time: UTC 02/06/2016 18:00:00`
谢谢大家!
更新:感谢Dan和Peter的以下想法。这是我使用这些建议所取得的进步。我创建了一个干净的Python 2.7虚拟环境(在VS 2005中),并使用pip install azure-storage
来获取我的site-packages目录中的依赖项。然后我按照Dan的说明压缩了site-packages文件夹并上传为Zip文件。然后,我将对site-packages目录的引用包含在内,并成功导入了所需的项目。这导致写入博客存储时出现超时错误。
这是我的代码:
# Get access to the uploaded Python packages
import sys
packages = ".\Script Bundle\site-packages"
sys.path.append(packages)
# Import the necessary items from packages referenced above
from azure.storage.blob import BlobService
from azure.storage.queue import QueueService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='p8kSy3F...elided...3plQ=='
blob_service = BlobService(account_name, account_key)
blob_service.put_block_blob_from_text("upload","out.txt","Test to write")
# All of the following also fail
#blob_service.create_container('images')
#blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
#queue_service = QueueService(account_name, account_key)
#queue_service.create_queue('taskqueue')
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
这是新的错误日志:
Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,C:\pyhome\lib\site-packages\requests\packages\urllib3\util\ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Caught exception while executing function: Traceback (most recent call last):
File "C:\server\invokepy.py", line 169, in batch
odfs = mod.azureml_main(*idfs)
File "C:\temp\azuremod.py", line 44, in azureml_main
blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
File ".\Script Bundle\site-packages\azure\storage\blob\blobservice.py", line 883, in put_blob
self._perform_request(request)
File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 171, in _perform_request
resp = self._filter(request)
File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 160, in _perform_request_worker
return self._httpclient.perform_request(request)
File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 181, in perform_request
self.send_request_body(connection, request.body)
File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 143, in send_request_body
connection.send(request_body)
File ".\Script Bundle\site-packages\azure\storage\_http\requestsclient.py", line 81, in send
self.response = self.session.request(self.method, self.uri, data=request_body, headers=self.headers, timeout=self.timeout)
File "C:\pyhome\lib\site-packages\requests\sessions.py", line 464, in request
resp = self.send(prep, **send_kwargs)
File "C:\pyhome\lib\site-packages\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\pyhome\lib\site-packages\requests\adapters.py", line 431, in send
raise SSLError(e, request=request)
SSLError: The write operation timed out
---------- End of error message from Python interpreter ----------
Start time: UTC 02/10/2016 15:33:00
End time: UTC 02/10/2016 15:34:18
我目前的探索领先地位是requests
中对azure-storage
Python包的依赖。 requests
在Python 2.7中有一个已知错误,用于调用较新的SSL协议。不确定,但我现在正在该地区挖掘。
更新2:此代码在Python 3 Jupyter笔记本中运行得非常好。此外,如果我将Blob容器打开以进行公共访问,我可以通过URL直接从Container读取。例如:df = pd.read_csv("https://mystorageaccount.blob.core.windows.net/upload/test.csv")
可以轻松地从blob存储加载文件。但是,我无法使用azure.storage.blob.BlobService
从同一个文件中读取。
更新3:Dan在下面的评论中建议我尝试使用Azure ML 上托管的Jupyter笔记本。我从当地的Jupyter笔记本上运行它(参见上面的更新2)。 但是,从Azure ML Notebook运行时失败,错误再次指向requires
包。我需要找到该软件包的已知问题,但从我的阅读中,已知问题是urllib3,并且只影响Python 2.7而不影响任何Python 3.x版本。这是在Python 3.x笔记本中运行的。哎呀。
更新4:正如Dan在下面所说,这可能是Azure ML网络的一个问题,因为Execute Python Script
相对较新并且只是获得了网络支持。但是,我还在Azure App Service webjob上测试了这一点,该webjob位于完全不同的Azure平台上。 (它也是一个完全不同的Python发行版,支持Python 2.7和3.4 / 5,但只支持32位 - 甚至在64位计算机上。)代码也失败,带有InsecurePlatformWarning
消息。
[02/08/2016 15:53:54 > b40783: SYS INFO] Run script 'ListenToQueue.py' with script host - 'PythonScriptHost'
[02/08/2016 15:53:54 > b40783: SYS INFO] Status changed to Running
[02/08/2016 15:54:09 > b40783: INFO] test.csv
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
[02/08/2016 15:54:09 > b40783: ERR ] SNIMissingWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ] InsecurePlatformWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ] InsecurePlatformWarning
答案 0 :(得分:5)
Bottom Line Up Front:使用HTTP而不是HTTPS来访问Azure存储。
在protocol='http'
中声明BlobService传递以强制服务通过HTTP进行通信时。请注意,您必须将容器配置为允许通过HTTP进行请求(默认情况下这样做)。
client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http")
历史与信誉:
我在@AzureHelps上发布了关于此主题的查询,他们在MSDN论坛上开了一张票:https://social.msdn.microsoft.com/Forums/azure/en-US/46166b22-47ae-4808-ab87-402388dd7a5c/trouble-writing-blob-storage-file-in-azure-ml-experiment?forum=MachineLearning&prof=required
Sudarshan Raghunathan用魔法回答道。以下是让每个人都可以轻松复制修复程序的步骤:Execute Python Script
模块BlobService
protocol='http'
对象
可在此处找到一些示例代码:https://gist.github.com/drdarshan/92fff2a12ad9946892df
我使用的代码如下,它不首先将CSV写入文件系统,而是作为文本流发送。
from azure.storage.blob import BlobService
def azureml_main(dataframe1 = None, dataframe2 = None):
account_name = 'mystorageaccount'
account_key='p8kSy3FACx...redacted...ebz3plQ=='
container_name = "upload"
json_output_file_name = 'testfromml.json'
json_orient = 'records' # Can be index, records, split, columns, values
json_force_ascii=False;
blob_service = BlobService(account_name, account_key, protocol='http')
blob_service.put_block_blob_from_text(container_name,json_output_file_name,dataframe1.to_json(orient=json_orient, force_ascii=json_force_ascii))
# Return value must be of a sequence of pandas.DataFrame
return dataframe1,
一些想法:
来自微软的丹,彼得和苏达山的巨大道具,帮助他们解决这个问题。我非常感谢!
答案 1 :(得分:1)
你走的是正确的道路。 Execution Python Script
模块就像这样的自定义需求。您真正的问题是如何导入现有的Python脚本模块。完整的方向可以在这里找到,但我将总结为SO。
您需要使用Azure Python SDK并将其压缩,上传,然后导入到您的模块中。我可以调查为什么默认情况下不存在...
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-execute-python-scripts/
导入现有的Python脚本模块
许多数据科学家的常见用例是将现有的Python脚本合并到Azure机器学习实验中。 Execute Python Script模块不是将所有代码连接并粘贴到单个脚本框中,而是接受第三个输入端口,可以连接包含Python模块的zip文件。然后,文件在运行时由执行框架解压缩,内容将添加到Python解释器的库路径中。然后,azureml_main入口点函数可以直接导入这些模块。
作为示例,请考虑包含简单“Hello,World”函数的文件Hello.py。
图4.用户定义的函数。
接下来,我们可以创建一个包含Hello.py的文件Hello.zip:
图5.包含用户定义的Python代码的Zip文件。
然后,将其作为数据集上载到Azure Machine Learning Studio中。如果我们然后创建并运行一个简单的实验,则使用模块:
图6.使用用户定义的Python代码上传为zip文件的示例实验。
模块输出显示zip文件已解压缩,并且函数print_hello确实已运行。 图7. Execute Python Script模块中使用的用户定义函数。
答案 2 :(得分:1)
据我所知,您可以通过您提供给第三个输入的zip文件使用其他包。 Azure ML中的Python模板脚本中的注释表示:
如果连接了第三个输入端口的zip文件,则会在&#34;。\ Script Bundle&#34;下解压缩。此目录将添加到sys.path。因此,如果您的zip文件包含Python文件mymodule.py,您可以使用以下命令导入它: 导入mymodule
因此,您可以将azure-storage-python
打包为zip文件,然后点击新建,点击数据集,然后选择来自本地文件以及 Zip文件选项,用于将ZIP文件上传到工作区。
作为参考,您可以在文档Execute Python Script
的How to Use Execute Python Script
部分查看更多信息。