无法通过Spark与HBase通信

时间:2018-07-02 22:52:26

标签: apache-spark hadoop hbase apache-zookeeper

我有一个项目,需要在本地环境中配置spark和hbase。我下载了spark-2.2.1,hadoop 2.7和hbase 1.1.8,并在独立的单节点Ubuntu 14.04 OS上进行了相应的配置。 我能够将数据从spark推送到HDFS,但不能通过hbase推送。

core-site.xml:

   <?xml version="1.0"?>
     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     <configuration>
         <property>
             <name>fs.defaultFS</name>
             <value>hdfs://localhost:9000</value>
         </property>
     </configuration>

hdfs-site.xml:

 [root@localhost conf]# cat hdfs-site.xml <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
     <property>
         <name>dfs.namenode.rpc-bind-host</name>
         <value>0.0.0.0</value>
     </property>
    <property>
         <name>dfs.namenode.servicerpc-bind-host</name>
         <value>0.0.0.0</value>
     </property> </configuration>

spark-env.sh

     [root@localhost conf]# cat spark-env.sh
     export JAVA_HOME=/usr/lib/jvm/java-8-oracle
     export SPARK_WORKER_MEMORY=1g
     export SPARK_WORKER_INSTANCES=1
     export SPARK_MASTER_IP=127.0.0.1
     export SPARK_MASTER_PORT=7077
     export SPARK_WORKER_DIR=/app/spark/tmp

     # Options read in YARN client mode

     export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
     export SPARK_EXECUTOR_INSTANCES=1
     export SPARK_EXECUTOR_CORES=1
     export SPARK_EXECUTOR_MEMORY=1G
     export SPARK_DRIVER_MEMORY=1G
     export SPARK_YARN_APP_NAME=Spark
     export SPARK_CLASSPATH=/opt/hbase/lib/*

hbase-site.xml:

 [root@localhost conf]# cat hbase-site.xml <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>   <property>
     <name>hbase.rootdir</name>
     <value>hdfs://localhost:9000/hbase</value>   </property>   <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>   </property>   <property>
     <name>hbase.zookeeper.quorum</name>
     <value>localhost</value>   </property>

    <property>
       <name>hbase.zookeeper.property.dataDir</name>
       <value>hdfs://localhost:9000/zookeeper</value>    </property>    <property>
       <name>hbase.master.dns.interface</name>
       <value>default</value>    </property>   <property>
     <name>hbase.master.ipc.address</name>
     <value>localhost</value>   </property>   <property>
     <name>hbase.regionserver.dns.interface</name>
     <value>default</value>   </property>   <property>
     <name>hbase.regionserver.ipc.address</name>
     <value>HOSTNAME</value>   </property>

   <property>
     <name>hbase.zookeeper.dns.interface</name>
     <value>default</value>   </property>

 </configuration>

spark-defaults.conf:

  [root@localhost conf]# cat spark-defaults.conf 
     spark.master           
     spark://127.0.0.1:7077 spark.yarn.dist.files
     /opt/spark/conf/hbase-site.xml

错误: 即使将hbase lib(jars)导出到spark-env.sh中,也无法导入hbase库(例如:HBaseConfiguration)。

scala> import org.apache.hadoop.hbase.HBaseConfiguration
<console>:23: error: object hbase is not a member of package org.apache.hadoop
       import org.apache.hadoop.hbase.HBaseConfiguration
                                ^

如果我通过--drive-class-path加载这些jar

 spark-shell --master local --driver-class-path=/opt/hbase/lib/*

scala>     conf.set("hbase.zookeeper.quorum","localhost")

scala>     conf.set("hbase.zookeeper.property.clientPort", "2181")

scala>     val connection: Connection = ConnectionFactory.createConnection(conf)
connection: org.apache.hadoop.hbase.client.Connection = hconnection-0x2a4cb8ae

scala>     val tableName = connection.getTable(TableName.valueOf("employee"))
tableName: org.apache.hadoop.hbase.client.Table = employee;hconnection-0x2a4cb8ae

scala>     val insertData = new Put(Bytes.toBytes("1"))
insertData: org.apache.hadoop.hbase.client.Put = {"totalColumns":0,"row":"1","families":{}}

scala>
     |     insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("Name"), Bytes.toBytes("Jeevan"))
res3: org.apache.hadoop.hbase.client.Put = {"totalColumns":1,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807}]}}

scala>     insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("City"), Bytes.toBytes("San Jose"))
res4: org.apache.hadoop.hbase.client.Put = {"totalColumns":2,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807}]}}

scala>     insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("Company"), Bytes.toBytes("Cisco"))
res5: org.apache.hadoop.hbase.client.Put = {"totalColumns":3,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807},{"qualifi
":"Company","vlen":5,"tag":[],"timestamp":9223372036854775807}]}}

scala>     insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("location"), Bytes.toBytes("San Jose"))
res6: org.apache.hadoop.hbase.client.Put = {"totalColumns":4,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807},{"qualifi
":"Company","vlen":5,"tag":[],"timestamp":9223372036854775807},{"qualifier":"location","vlen":8,"tag":[],"timestamp":9223372036
4775807}]}}

但是我没有在Hbase中看到任何新列。

任何人都可以帮忙。对配置的任何引用都将很棒。我需要配置任何动物园管理员吗?感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

必须将insertData对象放入htable。请使用tableName.put(insertData)