Question

我在Hive表中有三个时间戳列，我试图通过PySpark使用HiveContext进行查询（PySpark 1.6.0，Hadoop 2.6.0，CDH 5.9.2）。其中一列没有空值，另外两列有空值。来自PySpark的查询返回列中没有空值的有效时间戳值，但只返回其他两个值的空值。

查询没有转换或case语句的timestamp列只为包含空值的列向PySpark Dataframe提供NULL结果。

我尝试了以下案例陈述。在每个中，值都在Impala中正确显示，但在PySpark结果中没有显示，这显示RDD为＆＃39; 9999-12-31 00：00：00＆＃39;对于所有值，不仅仅是NULL值：

case when this_dttm is null then cast('9999-12-31 00:00:00' as string) else this_dttm end as this_date
case when this_dttm is not null then this_dttm else cast('9999-12-31 00:00:00' as string) end as this_date
case when unix_timestamp(this_dttm) > 0 then this_dttm else cast('9999-12-31 00:00:00' as timestamp) end as this_date

我还尝试过从时间戳到字符串回到时间戳的转换，但这会在Hive中将所有值都呈现为NULL

cast(concat(substr(cast(this_dttm as string),7,4),"-",substr(cast(this_dttm as string),1,2),"-",substr(cast(this_dttm as string),4,2)) as timestamp) as this_date

任何有关使用NULL值处理timestampType列的建议都将受到赞赏。

PySpark中的HiveContext查询仅返回具有一些NULL值

0 个答案: