Question

我有以下的火花脚本：

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, HiveContext
spark_context = SparkContext(conf=SparkConf())
sqlContext = HiveContext(spark_context)
outputPartition=sqlContext.sql("select * from dm_mmx_merge.PLAN_PARTITION ORDER BY PARTITION,ROW_NUM")
outputPartition.printSchema()
outputPartition.filter(outputPartition("partition")==3).show()

`

我得到架构的输出为“

root
 |-- seq: integer (nullable = true)
 |-- cpo_cpo_id: long (nullable = true)
 |-- mo_sesn_yr_cd: string (nullable = true)
 |-- prod_prod_cd: string (nullable = true)
 |-- cmo_ctry_nm: string (nullable = true)
 |-- cmo_cmo_stat_ind: string (nullable = true)
 |-- row_num: integer (nullable = true)
 |-- partition: long (nullable = true)

但我也得到错误： Traceback (most recent call last): File "hiveSparkTest.py", line 18, in <module> outputPartition.filter(outputPartition(partition)==3).show() TypeError: 'DataFrame' object is not callable

我需要获取每个分区值的输出并进行转换。任何帮助都会非常值得赞赏。

Answer 1

排队

 outputPartition.filter(outputPartition(partition)==3).show()

您正在尝试使用outputPartition作为方法。使用

 outputPartition['partition']

而不是

 outputPartition(partition)

spark中的HiveContext的数据帧不可调用

1 个答案: