我想从spark连接apache phoenix并运行join sql查询。正如凤凰官方网站所建议的那样,他们举了一个关于如何从spark连接凤凰的例子,但在配置中需要单个凤凰表名。请参阅以下示例:
Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", "TABLE_1");
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable("TABLE_1");
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 WHERE COLUMN_1 = 'ABC' ");
在我的phoenix-hbase数据库中,我有两个表TABLE_1
和TABLE_2
,我想运行一个这样的SQL查询:
SELECT * FROM TABLE_1 as A JOIN TABLE_2 as B ON A.COLUMN_1 = B.COLUMN_2 WHERE B.COLUMN_2 = 'XYZ';
如何使用Phoenix-Spark连接运行此查询?
答案 0 :(得分:0)
如上所述@Shaido在评论部分我尝试过并且它正在运行。我分别加载两个数据集,然后将它们注册为临时表,现在我可以使用这两个表运行连接查询。以下是示例代码。
String table1 = "TABLE_1";
Map<String, String> map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", table1);
Dataset<Row> df = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df.registerTempTable(tableName);
String table2 = "TABLE_2";
map = new HashMap<>();
map.put("zkUrl", ZOOKEEPER_URL);
map.put("table", table2);
Dataset<Row> df2 = sparkSession.sqlContext().load("org.apache.phoenix.spark", map);
df2.registerTempTable(table2);
Dataset<Row> selectResult = df.sparkSession().sql(" SELECT * FROM TABLE_1 as A JOIN TABLE_2 as B ON A.COLUMN_1 = B.COLUMN_2 WHERE B.COLUMN_2 = 'XYZ' ");