Question

我的jupyter笔记本，读取我创建的模块，在Jupyter中添加

sc.addPyFile('wasb:///HdiNotebooks/PySpark/project/read_test_data.py')

然后加载模块ok，

然而，我的＆＃34; py＆＃34;文件，将数据打开为：

data_file= open('wasb:///example/data/fruits.txt', 'rU') 
to prepare it and do different calculations.

但是，我收到以下错误

[Errno 2] No such file or directory: 'wasb:///example/data/fruits.txt'
Traceback (most recent call last):
FileNotFoundError: [Errno 2] No such file or directory: 'wasb:///example/data/fruits.txt'

如果我尝试在jupyter中创建一个具有相同数据的数据帧，我运行

df=sqlContext.read.csv('wasb:///example/data/fruits.txt',header='true', inferSchema='true')

我没有得到任何错误。我做错了什么？

Answer 1

Python API open不支持基于Azure Blob存储的HDInsight DFS的wasb协议。

如果您想在没有pyspark的情况下直接在HDInsight上读取文件，唯一的方法是使用Azure Storage SDK for Python来实现account_name＆amp; document表示HDInsight的Azure Blob存储帐户的account_key，请参阅Python中的Azure存储的官方tutorial。

希望它有所帮助。

在Azure Pyspark中使用我自己的python模块，该模块读取并准备数据

1 个答案: