根据凤凰官方网站的建议,我已经从火花连接凤凰。简单的选择查询没有任何问题,但是当我尝试使用日期列上的过滤器运行查询时,它会出现一些错误。
以下是示例代码
Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", "TABLE_1");
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable("TABLE_1");
// This query works without any error
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT COUNT(1) AS ROW_COUNT
FROM TABLE_1 WHERE TEXT_COLUMN_1 = 'ABC' ");
但是当我在日期列上运行带过滤器的查询时,它会给我错误
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE
DATE_COLUMN_1 BETWEEN to_date('2015-01-02') AND to_date('2016-12-30') ");
尝试了许多不同的方式来提供如下所述的日期格式
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE
DATE_COLUMN_1 BETWEEN cast('2015-01-02' as date) AND cast('2015-01-02' as date) ");
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE
DATE_COLUMN_1 <= cast('2015-01-02' as date) AND DATE_COLUMN_1 >= cast('2015-01-02' as date) ");
错误讯息: -
18/01/23 17:05:26 INFO PhoenixInputFormat: UseSelectColumns=true, selectColumnList.size()=1, selectColumnList=DATE_COLUMN_1
18/01/23 17:05:26 INFO PhoenixInputFormat: Select Statement: SELECT "DATE_COLUMN_1" FROM TABLE_1 WHERE ( DATE_COLUMN_1 IS NOT NULL AND DATE_COLUMN_1 >= 2015-01-02 AND DATE_COLUMN_1 <= 2016-12-30)
18/01/23 17:05:26 ERROR PhoenixInputFormat: Failed to get the query plan with error [ERROR 203 (22005): Type mismatch. DATE and BIGINT for DATE_COLUMN_1 >= 2012]
Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#1671L])
+- *Project
+- *Filter ((isnotnull(DATE_COLUMN_1#997) && (DATE_COLUMN_1#997 >= 16437)) && (DATE_COLUMN_1#997 <= 17165))
+- *Scan PhoenixRelation(TABLE_1,localhost:2181,false) [DATE_COLUMN_1#997] PushedFilters: [IsNotNull(DATE_COLUMN_1), GreaterThanOrEqual(DATE_COLUMN_1,2015-01-02), LessThanOrEqual(TR..., ReadSchema: struct<>
答案 0 :(得分:0)
删除to_date()函数后,它正在工作。
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE
DATE_COLUMN_1 BETWEEN '2015-01-02' AND '2016-12-30' ");