我正在从Datalake加载数据,然后选择存储在csv文件中的字段。然后我想显示结果,然后出现此错误:
AttributeError: 'PipelinedRDD' object has no attribute 'show'
当我尝试将PipelinedRDD转换为功能为DF()的数据帧时,出现此错误:
Py4JError: An error occurred while calling o2871.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
有人可以帮我吗?
jump_change_raw = sqlContext.read.format("com.databricks.spark.avro")\
.load("JUMP_CHANGES/SV1/HISTORY/*.avro")
ddlk = sqlContext.read.format("com.databricks.spark.csv").load("/user/ddlk.csv")
label_fields = ddlk.select(split(ddlk.C0, ";").alias("fields"))
dlk_fields = label_fields.select(
label_fields.fields[1].alias("jump_dlk"),
label_fields.fields[3].alias("impulse_dlk"),
label_fields.fields[4].alias("dlk")
).filter(col("dlk") != "")
jump = jump_change_raw.select(
jump_change_raw.columns
).map(lambda x: dlk_fields.jump_dlk.contains(x)).toDF()
jump.show()