使用Phoenix

时间:2018-12-26 07:05:31

标签: json scala apache-spark hbase phoenix

我正在通过使用“ Scala”从REST API获取数据来创建“ RDD” newData。 RDD的内容是“ JSON”对象,如下所示:

JObject(List((activity-hrt,JArray(List(JObject(List((date,JString(2018-12-21))),(value,JObject(List((customHrtRtZns,JArray(List()))) ,(hrtRtZns,JArray(List(JObject(List((max,JInt(88)),(min,JInt(30)),(name,JString(Out of Range)))))),JObject(List(( JInt(123)),(min,JInt(88)),(name,JString(Fat Burn))))),JObject(List((max,JInt(150)),(min,JInt(123)),( name,JString(Cardio)))),JObject(List((max,JInt(220)),(min,JInt(150)),(name,JString(Peak))))))))))))))))))) )))))

我想使用“ Phoenix”将此RDD数据放入“ HBASE”表中。我也需要确定相应的“ Phoenix”表结构。下面是我创建的“凤凰”表结构:

CREATE TABLE IF NOT EXISTS "HRT"(
    "id" INTEGER,
    "activitiesHrt"."dateTime" VARCHAR(32),
    "value"."customHrtRtZns" VARCHAR(32),
    "hrtRtZns"."max" INTEGER,
    "hrtRtZns"."min" INTEGER,
    "hrtRtZns"."name" VARCHAR(32),
    CONSTRAINT pk PRIMARY KEY ("userid")
);

除了将数据从RDD插入到上表之外,我还需要在ID列中插入递增序号以用于主键。我尝试了以下代码:

import org.apache.phoenix.spark._

sc.parallelize(newData)
.saveToPhoenix(
"HEART",
Seq("id","date","customHrtRtZns","max","min",
"name"),

zkUrl = Some(phoenixServer)

但是出现以下错误:

import spark.implicits._
<console>:116: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: Seq[?]
               sc.parallelize(newData).saveToPhoenix(
                              ^
<console>:124: error: not found: value zkUrl
               zkUrl = Some(phoenixServer)
               ^
import spark.implicits._
<console>:116: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: Seq[?]
               sc.parallelize(newData).saveToPhoenix(
                              ^
<console>:124: error: not found: value zkUrl
               zkUrl = Some(phoenixServer)
               ^

我已经使用正确的硬编码值检查了到上表的插入。关于如何使用RDD以及在PK列中递增序列号的任何建议。

0 个答案:

没有答案