我试图复制我在这里看到的例子时遇到了一个问题 - https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-load-data-run-query。
它似乎失败了:hvacTable = sqlContext.createDataFrame(hvac)
它返回的错误是:
'PipelinedRDD' object has no attribute '_get_object_id'
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/sql/context.py", line 333, in createDataFrame
return self.sparkSession.createDataFrame(data, schema, samplingRatio, verifySchema)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1124, in __call__
args_command, temp_args = self._build_args(*args)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1094, in _build_args
[get_command_part(arg, self.pool) for arg in new_args])
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 289, in get_command_part
command_part = REFERENCE_TYPE + parameter._get_object_id()
AttributeError: 'PipelinedRDD' object has no attribute '_get_object_id'
我跟着T的例子,这是Jupyter的pyspark笔记本。
为什么会出现此错误?
答案 0 :(得分:0)
您可能在较新的群集上运行它。请更新" sqlContext" to" spark"让它工作。我们也会更新此doc文章。
同样在Spark 2.x中,您现在可以使用更简单的DataFrame进行此操作。您可以使用以下等效项替换创建hvac表的代码段:
csvFile = spark.read.csv('wasb:///HdiSamples/HdiSamples/SensorSampleData/hvac/HVAC.csv', header=True, inferSchema=True)
csvFile.write.saveAsTable("hvac")