Spark-Phoenix连接,sql查询在日期列上有过滤器的问题

时间:2018-01-23 11:46:58

标签: java apache-spark hbase apache-spark-sql phoenix

根据凤凰官方网站的建议,我已经从火花连接凤凰。简单的选择查询没有任何问题,但是当我尝试使用日期列上的过滤器运行查询时,它会出现一些错误。

以下是示例代码

Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", "TABLE_1");
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable("TABLE_1");
// This query works without any error
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT COUNT(1) AS ROW_COUNT 
FROM TABLE_1 WHERE TEXT_COLUMN_1 = 'ABC' ");

但是当我在日期列上运行带过滤器的查询时,它会给我错误

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN to_date('2015-01-02') AND to_date('2016-12-30') ");

尝试了许多不同的方式来提供如下所述的日期格式

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN cast('2015-01-02' as date) AND cast('2015-01-02' as date) ");

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 <= cast('2015-01-02' as date) AND DATE_COLUMN_1 >= cast('2015-01-02' as date) ");

错误讯息: -

18/01/23 17:05:26 INFO PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=1, selectColumnList=DATE_COLUMN_1 
18/01/23 17:05:26 INFO PhoenixInputFormat: Select Statement: SELECT "DATE_COLUMN_1" FROM TABLE_1 WHERE ( DATE_COLUMN_1 IS NOT NULL AND DATE_COLUMN_1 >= 2015-01-02 AND DATE_COLUMN_1 <= 2016-12-30)
18/01/23 17:05:26 ERROR PhoenixInputFormat: Failed to get the query plan with error [ERROR 203 (22005): Type mismatch. DATE and BIGINT for DATE_COLUMN_1 >= 2012]
Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1671L])
   +- *Project
      +- *Filter ((isnotnull(DATE_COLUMN_1#997) && (DATE_COLUMN_1#997 >= 16437)) && (DATE_COLUMN_1#997 <= 17165))
         +- *Scan PhoenixRelation(TABLE_1,localhost:2181,false) [DATE_COLUMN_1#997] PushedFilters: [IsNotNull(DATE_COLUMN_1), GreaterThanOrEqual(DATE_COLUMN_1,2015-01-02), LessThanOrEqual(TR..., ReadSchema: struct<>

1 个答案:

答案 0 :(得分:0)

删除to_date()函数后,它正在工作。

Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE 
DATE_COLUMN_1 BETWEEN '2015-01-02' AND '2016-12-30' ");