def main(inputs, output):
sdf = spark.read.csv(inputs, schema=observation_schema)
sdf.registerTempTable('filtertable')
result = spark.sql("""
SELECT * FROM filtertable WHERE qflag IS NULL
""").show()
temp_max = spark.sql(""" SELECT date, station, value FROM filtertable WHERE (observation = 'TMAX')""").show()
temp_min = spark.sql(""" SELECT date, station, value FROM filtertable WHERE (observation = 'TMIN')""").show()
result = temp_max.join(temp_min, condition1).select(temp_max('date'), temp_max('station'), ((temp_max('TMAX')-temp_min('TMIN'))/10)).alias('Range'))
错误:
Traceback (most recent call last):
File "/Users/syedikram/Documents/temp_range_sql.py", line 96, in <module>
main(inputs, output)
File "/Users/syedikram/Documents/temp_range_sql.py", line 52, in main
result = temp_max.join(temp_min, condition1).select(temp_max('date'), temp_max('station'), ((temp_max('TMAX')-temp_min('TMIN')/10)).alias('Range'))
AttributeError: 'NoneType' object has no attribute 'join'
执行联接操作会给我带来Nonetype对象错误。联机查找无济于事,因为pyspark sql的在线文档很少。 我在这里做什么错了?
答案 0 :(得分:3)
从.show()
和temp_max
中删除temp_min
,因为show
仅打印字符串,不返回任何内容(因此得到AttributeError: 'NoneType' object has no attribute 'join'
)。