Question

我有一个运行Spark 1.6.2＆amp;的HDInsight集群。 Jupyter 在一个jupyter笔记本中，我运行我的pyspark命令，一些输出在pandas数据帧中处理。

作为最后一步，我想将我的pandas数据帧保存到csv文件中，并且：

将其保存到'jupyter filesystem'并将其下载到我的笔记本电脑
将其保存到我的blob存储

但我不知道该怎么做。

我尝试了以下内容：

1。将其保存到'jupyter文件系统'并将其下载到我的笔记本电脑

# df is my resulting dataframe, so I save it to the filesystem where jupyter runs
df.to_csv('app_keys.txt')

我希望它与我的笔记本保存在同一目录中，从而在浏览器的树视图中查看它。不是这种情况。所以我的问题是：这个文件保存在文件系统中的哪个位置？

2。将其保存到我的blob存储 谷歌搜索后，似乎我也可以使用azure.storage.blob模块将文件上传到blob存储。所以我试过了：

from azure.storage.blob import BlobService # a lot of examples online import BlockBlobService but this one is not available in HDInsight

# i have all variables in CAPITALS provided in the code
blob_service=BlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)

# check if reading from blob works
blob_service.get_blob_to_path(CONTAINERNAME, 'iris.txt', 'mylocalfile.txt') # this works

# now try to reverse the process and write to blob
blob_service.create_blob_from_path(CONTAINERNAME,'myblobfile.txt','mylocalfile.txt')   # fails with AttributeError: 'BlobService' object has no attribute 'create_blob_from_path'

或

blob_service.create_blob_from_text(CONTAINERNAME,'myblobfile.txt','mylocalfile.txt') # fails with 'BlobService' object has no attribute 'create_blob_from_text'

所以我不知道如何回写并访问我从熊猫写到文件系统的东西。

任何帮助都是适用的

Answer 1

根据我的经验，您遇到的第二个问题是由于用于python的azure存储客户端库的版本。对于旧版本，库不包含您在代码中调用的方法。以下网址对您有用。

How to import Azure BlobService in python?

如何从pd.to_csv（）保存的HDInsight群集下载我的文件

1 个答案: