使用Spark

时间:2018-09-17 05:56:47

标签: scala maven apache-spark apache-spark-sql hbase

我正在使用Spark连接到Hbase。我已经添加了所有依赖项,但仍然出现此异常。请帮助我解决此问题需要添加的JAR。

SPARK_MAJOR_VERSION is set to 2, using Spark2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12                                                                                        -1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12                                                                                        -1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0                                                                                        .2.6.5.0-292-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0                                                                                        .2.6.5.0-292-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve                                                                                        l(newLevel).
18/09/17 05:34:36 WARN Utils: Service 'SparkUI' could not bind on port 4040. Att                                                                                        empting port 4041.
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4041
Spark context available as 'sc' (master = local[*], app id = local-1537162476668).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0.2.6.5.0-292
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

 import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}
 import spark.sqlContext.implicits._
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat
 import org.apache.hadoop.hbase.client.HBaseAdmin
 import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}

  def catalog = s"""{
     |"table":{"namespace":"default", "name":"Contacts"},
     |"rowkey":"key",
     |"columns":{
     |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
     |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
     |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
     |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
     |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
     |}
 |}""".stripMargin

      def withCatalog(cat: String): DataFrame = {
         spark.sqlContext
         .read
         .options(Map(HBaseTableCatalog.tableCatalog->cat))
         .format("org.apache.spark.sql.execution.datasources.hbase")
         .load()
     }
 val df = withCatalog(catalog)
 df.registerTempTable("contacts")
 val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
 query.show()   <p>

//正在退出粘贴模式,现在正在解释。

warning: there was one deprecation warning; re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos$MasterService$BlockingInterface
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

下面是Spark-Jar的Folder中可用的Jar

hbase-0.94.2.jar
hbase-annotations-1.2.0.jar
hbase-client-2.1.0.jar
hbase-common-2.1.0.jar
hbase-hadoop-compat-2.1.0.jar
hbase-hadoop2-compat-2.1.0.jar
hbase-it-1.1.2.2.6.5.0-292.jar
hbase-prefix-tree-1.1.2.2.6.5.0-292.jar
hbase-procedure-1.1.2.2.6.5.0-292.jar
hbase-protocol-2.1.0.jar
hbase-server-2.1.0.jar
hbase-spark-1.2.0-cdh5.8.3.jar
hbase-spark-1.1.2.2.6.5.0-292.jar
hbase-thrift-1.1.2.2.6.5.0-292.jar
hive-hbase-handler-0.12.0-cdh5.1.3.jar
hive-hbase-handler-3.1.0.jar
protobuf-java-3.5.1.jar

请提供一些建议,例如我想添加到jars文件夹中以便连接到hbase的jar。

1 个答案:

答案 0 :(得分:2)

似乎您缺少一个shc-core jar,该jar用于将数据帧写入到hortonworks禁止的hbase中。

从hortonworks-shc-connector导入软件包时

import org.apache.spark.sql.execution.datasources.hbase._

您需要将罐子添加到spark应用程序中。

获取jar的shc-core连接器的步骤:

首先获取hortonworks-spark/hbase connector github存储库,然后检出到您在环境中使用的具有hbase和hadoop版本的适当分支,并使用

进行构建
mvn clean install -DskipTests

执行上面的操作后,您的~/.m2/repository/com/hortonworks/shc/中将有一个jar

将此罐子用于火花应用。

您可以将其添加到spark-jar文件夹中,也可以使用--jars标志将其传递到spark-submit / spark-shell中

然后使用try执行您尝试运行的代码。

我遵循了相同的步骤,并且能够使用HCatalog从hbase读取

示例

spark-shell --jars shc-core-1.1.3-2.4-s_2.11.jar

SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4040  
Spark context available as 'sc' (master = yarn, app id = 
application_1592322799672_0007).
Spark session available as 'spark'.
Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 2.4.0.7.0.3.0-79
   /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}
 import spark.sqlContext.implicits._
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat
 import org.apache.hadoop.hbase.client.HBaseAdmin
 import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}

  def catalog = s"""{
     |"table":{"namespace":"default", "name":"Contacts"},
     |"rowkey":"key",
     |"columns":{
     |"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
     |"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
     |"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
     |"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
     |"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
     |}
 |}""".stripMargin

      def withCatalog(cat: String): DataFrame = {
         spark.sqlContext
         .read
         .options(Map(HBaseTableCatalog.tableCatalog->cat))
         .format("org.apache.spark.sql.execution.datasources.hbase")
         .load()
     }
 val df = withCatalog(catalog)
 df.registerTempTable("contacts")
 val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
 query.show()

// Exiting paste mode, now interpreting.

warning: there was one deprecation warning; re-run with -deprecation for details
Hive Session ID = 5cc02976-98c4-447f-9ba0-e70c4a3c4ab1
+------------+-------------+                                                    
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+

import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory, HBaseAdmin, HTable, Put, Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor, HColumnDescriptor}
catalog: String
withCatalog: (cat: String)org.apache.spark.sql.DataFrame
df: org.apache.spark.sql.DataFrame = [rowkey: string, officeAddress: string ... 3 more fields]
query: org.apache.spark.sql.DataFrame = [personalName: string, officeAddress: string]

scala> query.show()
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+


scala> 

堆栈版本:

HBase 2.2.0
Hadoop 3.1.1
Spark 2.4.0
Scala 2.11.12