我正在通过使用“ Scala”从REST API获取数据来创建“ RDD” newData。 RDD的内容是“ JSON”对象,如下所示:
JObject(List((activity-hrt,JArray(List(JObject(List((date,JString(2018-12-21))),(value,JObject(List((customHrtRtZns,JArray(List()))) ,(hrtRtZns,JArray(List(JObject(List((max,JInt(88)),(min,JInt(30)),(name,JString(Out of Range)))))),JObject(List(( JInt(123)),(min,JInt(88)),(name,JString(Fat Burn))))),JObject(List((max,JInt(150)),(min,JInt(123)),( name,JString(Cardio)))),JObject(List((max,JInt(220)),(min,JInt(150)),(name,JString(Peak))))))))))))))))))) )))))
我想使用“ Phoenix”将此RDD数据放入“ HBASE”表中。我也需要确定相应的“ Phoenix”表结构。下面是我创建的“凤凰”表结构:
CREATE TABLE IF NOT EXISTS "HRT"(
"id" INTEGER,
"activitiesHrt"."dateTime" VARCHAR(32),
"value"."customHrtRtZns" VARCHAR(32),
"hrtRtZns"."max" INTEGER,
"hrtRtZns"."min" INTEGER,
"hrtRtZns"."name" VARCHAR(32),
CONSTRAINT pk PRIMARY KEY ("userid")
);
除了将数据从RDD插入到上表之外,我还需要在ID列中插入递增序号以用于主键。我尝试了以下代码:
import org.apache.phoenix.spark._
sc.parallelize(newData)
.saveToPhoenix(
"HEART",
Seq("id","date","customHrtRtZns","max","min",
"name"),
zkUrl = Some(phoenixServer)
但是出现以下错误:
import spark.implicits._
<console>:116: error: type mismatch;
found : org.apache.spark.rdd.RDD[String]
required: Seq[?]
sc.parallelize(newData).saveToPhoenix(
^
<console>:124: error: not found: value zkUrl
zkUrl = Some(phoenixServer)
^
import spark.implicits._
<console>:116: error: type mismatch;
found : org.apache.spark.rdd.RDD[String]
required: Seq[?]
sc.parallelize(newData).saveToPhoenix(
^
<console>:124: error: not found: value zkUrl
zkUrl = Some(phoenixServer)
^
我已经使用正确的硬编码值检查了到上表的插入。关于如何使用RDD以及在PK列中递增序列号的任何建议。