pyspark-Spark Hbase连接器抛出无法找到数据源

时间:2018-09-25 16:02:56

标签: pyspark hbase

我正在尝试通过参考以下链接,使用SHC API从Pyspark连接到hbase。

https://community.hortonworks.com/questions/143802/read-hbase-with-pyspark-from-jupyter-notebook.html

示例代码:

spark = SparkSession.builder.appName("Hbase Read").getOrCreate()
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"table"},
    "rowkey":"key",
    "columns":{
        "firstcol":{"cf":"rowkey", "col":"key", "type":"string"},
        "secondcol":{"cf":"cf", "col":"col1", "type":"int"}
    }
}""".split())
df = spark.read \
    .options(catalog=catalog) \
    .format(data_source_format) \
    .load()
df.show()

火花提交:

spark-submit  --packages com.hortonworks:shc:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ TestHbaseRead.py

我遇到此错误。

错误日志:

: java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.execution.datasources.hbase. Please find packages at http://spark.apache.org/third-party-projects.html
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.hbase.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
        ... 13 more

0 个答案:

没有答案