Question

我正在通过使用“ Scala”从REST API获取数据来创建“ RDD” newData。 RDD的内容是“ JSON”对象，如下所示：

JObject（List（（activity-hrt，JArray（List（JObject（List（（date，JString（2018-12-21））），（value，JObject（List（（customHrtRtZns，JArray（List（）））），（hrtRtZns，JArray（List（JObject（List（（max，JInt（88）），（min，JInt（30）），（name，JString（Out of Range）））））），JObject（List（（ JInt（123）），（min，JInt（88）），（name，JString（Fat Burn））））），JObject（List（（max，JInt（150）），（min，JInt（123）），（ name，JString（Cardio）））），JObject（List（（max，JInt（220）），（min，JInt（150）），（name，JString（Peak））））））））））））））））））））））））

我想使用“ Phoenix”将此RDD数据放入“ HBASE”表中。我也需要确定相应的“ Phoenix”表结构。下面是我创建的“凤凰”表结构：

CREATE TABLE IF NOT EXISTS "HRT"(
    "id" INTEGER,
    "activitiesHrt"."dateTime" VARCHAR(32),
    "value"."customHrtRtZns" VARCHAR(32),
    "hrtRtZns"."max" INTEGER,
    "hrtRtZns"."min" INTEGER,
    "hrtRtZns"."name" VARCHAR(32),
    CONSTRAINT pk PRIMARY KEY ("userid")
);

除了将数据从RDD插入到上表之外，我还需要在ID列中插入递增序号以用于主键。我尝试了以下代码：

import org.apache.phoenix.spark._

sc.parallelize(newData)
.saveToPhoenix(
"HEART",
Seq("id","date","customHrtRtZns","max","min",
"name"),

zkUrl = Some(phoenixServer)

但是出现以下错误：

import spark.implicits._
<console>:116: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: Seq[?]
               sc.parallelize(newData).saveToPhoenix(
                              ^
<console>:124: error: not found: value zkUrl
               zkUrl = Some(phoenixServer)
               ^
import spark.implicits._
<console>:116: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: Seq[?]
               sc.parallelize(newData).saveToPhoenix(
                              ^
<console>:124: error: not found: value zkUrl
               zkUrl = Some(phoenixServer)
               ^

我已经使用正确的硬编码值检查了到上表的插入。关于如何使用RDD以及在PK列中递增序列号的任何建议。

使用Phoenix

0 个答案: