我有以下的火花脚本:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, HiveContext
spark_context = SparkContext(conf=SparkConf())
sqlContext = HiveContext(spark_context)
outputPartition=sqlContext.sql("select * from dm_mmx_merge.PLAN_PARTITION ORDER BY PARTITION,ROW_NUM")
outputPartition.printSchema()
outputPartition.filter(outputPartition("partition")==3).show()
`
我得到架构的输出为“
root
|-- seq: integer (nullable = true)
|-- cpo_cpo_id: long (nullable = true)
|-- mo_sesn_yr_cd: string (nullable = true)
|-- prod_prod_cd: string (nullable = true)
|-- cmo_ctry_nm: string (nullable = true)
|-- cmo_cmo_stat_ind: string (nullable = true)
|-- row_num: integer (nullable = true)
|-- partition: long (nullable = true)
但我也得到错误:
Traceback (most recent call last):
File "hiveSparkTest.py", line 18, in <module>
outputPartition.filter(outputPartition(partition)==3).show()
TypeError: 'DataFrame' object is not callable
我需要获取每个分区值的输出并进行转换。任何帮助都会非常值得赞赏。
答案 0 :(得分:2)
排队
outputPartition.filter(outputPartition(partition)==3).show()
您正在尝试使用outputPartition作为方法。 使用
outputPartition['partition']
而不是
outputPartition(partition)