使用PySpark从HBase阅读

时间:2018-07-12 13:13:22

标签: python apache-spark pyspark hbase

我正在尝试使用pyspark从HBase进行写入/读取。

环境:

  • CDH 5.13
  • Hbase 1.2.0
  • Spark 2.3(作为percel安装)
  • Python 3.6
  • PyCharm

我正在使用 HBase Spark Connector项目核心»1.1.1-2.1-s_2.11

http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc-core/1.1.1-2.1-s_2.11/

我的代码是:

from pyspark import SparkConf, SQLContext
from pyspark.sql import SparkSession
from datetime import datetime
import json

conf = (SparkConf()
       .setAppName("RW_from_HBase"))

spark = SparkSession.builder \
     .appName(" ") \
     .config(conf=conf) \
     .getOrCreate()

sc = spark.sparkContext
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

catalog = json.dumps(
    {
        "table":{"namespace":"spark", "name":"test_table"},
        "rowkey":"id",
        "columns":{
            "id":{"cf":"rowkey", "col":"id", "type":"string"},
            "filename":{"cf":"content", "col":"filename", "type":"string"},
            "created_ts":{"cf":"content", "col":"created_ts", "type":"string"},
            "html":{"cf":"content", "col":"html", "type":"string"}
        }
    })

# Writing into HBase
mydf.write\
    .options(catalog=catalog, newtable = 5)\
    .format(data_source_format)\
    .save()

# Reading from Hbase
df = sqlc.read\
    .options(catalog=catalog)\
    .format(data_source_format)\
    .load()

df.show()

我的火花提交是:

--master local[*] --packages com.databricks:spark-avro_2.11:4.0.0,com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/repositories/releases/ --queue PyCharmSpark pyspark-shell

当我写HBase时,一切正常,并将mydf中的数据保存到HBase表中。

当我尝试阅读时,它只能工作到火花动作为止。 df.show()-导致错误。

WARNING: Running spark-class from user-defined location.
http://repo.hortonworks.com/content/repositories/releases/ added as a remote repository with the name: repo-1
Ivy Default Cache set to: /home/cloudera/.ivy2/cache
The jars for the packages stored in: /home/cloudera/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.11 added as a dependency
com.hortonworks#shc-core added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found com.databricks#spark-avro_2.11;4.0.0 in central
    found org.slf4j#slf4j-api;1.7.5 in central
    found org.apache.avro#avro;1.7.6 in central
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
    found com.thoughtworks.paranamer#paranamer;2.3 in central
    found org.xerial.snappy#snappy-java;1.0.5 in central
    found org.apache.commons#commons-compress;1.4.1 in central
    found org.tukaani#xz;1.0 in central
    found com.hortonworks#shc-core;1.1.1-2.1-s_2.11 in repo-1
    found org.apache.hbase#hbase-server;1.1.2 in central
    found org.apache.hbase#hbase-protocol;1.1.2 in central
    found org.apache.hbase#hbase-annotations;1.1.2 in central
    found com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 in central
    found log4j#log4j;1.2.17 in central
    found junit#junit;4.11 in central
    found org.hamcrest#hamcrest-core;1.3 in central
    found com.google.protobuf#protobuf-java;2.5.0 in central
    found org.apache.hbase#hbase-procedure;1.1.2 in central
    found com.google.guava#guava;12.0.1 in central
    found com.google.code.findbugs#jsr305;1.3.9 in central
    found org.apache.hbase#hbase-client;1.1.2 in central
    found commons-codec#commons-codec;1.9 in central
    found commons-io#commons-io;2.4 in central
    found commons-lang#commons-lang;2.6 in central
    found io.netty#netty-all;4.0.23.Final in central
    found org.apache.zookeeper#zookeeper;3.4.6 in central
    found org.slf4j#slf4j-api;1.7.7 in central
    found org.slf4j#slf4j-log4j12;1.6.1 in central
    found org.apache.htrace#htrace-core;3.1.0-incubating in central
    found org.jruby.jcodings#jcodings;1.0.8 in central
    found org.jruby.joni#joni;2.1.2 in central
    found commons-httpclient#commons-httpclient;3.1 in central
    found commons-collections#commons-collections;3.2.1 in central
    found com.yammer.metrics#metrics-core;2.2.0 in central
    found com.sun.jersey#jersey-core;1.9 in central
    found com.sun.jersey#jersey-server;1.9 in central
    found commons-cli#commons-cli;1.2 in central
    found org.apache.commons#commons-math;2.2 in central
    found org.mortbay.jetty#jetty;6.1.26 in central
    found org.mortbay.jetty#jetty-util;6.1.26 in central
    found org.mortbay.jetty#jetty-sslengine;6.1.26 in central
    found org.mortbay.jetty#jsp-2.1;6.1.14 in central
    found org.mortbay.jetty#jsp-api-2.1;6.1.14 in central
    found org.mortbay.jetty#servlet-api-2.5;6.1.14 in central
    found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
    found tomcat#jasper-compiler;5.5.23 in central
    found org.jamon#jamon-runtime;2.3.1 in central
    found com.lmax#disruptor;3.3.0 in central
    found org.apache.hbase#hbase-prefix-tree;1.1.2 in central
    found org.mortbay.jetty#servlet-api;2.5-20081211 in central
    found tomcat#jasper-runtime;5.5.23 in central
    found commons-el#commons-el;1.0 in central
    found org.apache.hbase#hbase-common;1.1.2 in central
    found org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 in central
    found org.apache.tephra#tephra-api;0.9.0-incubating in central
    found org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating in central
    found org.apache.tephra#tephra-core;0.9.0-incubating in central
    found com.google.code.gson#gson;2.2.4 in central
    found com.google.guava#guava;13.0.1 in central
    found com.google.inject#guice;3.0 in central
    found javax.inject#javax.inject;1 in central
    found aopalliance#aopalliance;1.0 in central
    found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
    found asm#asm;3.1 in central
    found com.google.inject.extensions#guice-assistedinject;3.0 in central
    found ch.qos.logback#logback-classic;1.0.9 in central
    found ch.qos.logback#logback-core;1.0.9 in central
    found org.apache.thrift#libthrift;0.9.0 in central
    found org.apache.httpcomponents#httpcore;4.1.3 in central
    found it.unimi.dsi#fastutil;6.5.6 in central
    found org.apache.twill#twill-common;0.6.0-incubating in central
    found com.google.code.findbugs#jsr305;2.0.1 in central
    found org.apache.twill#twill-core;0.6.0-incubating in central
    found org.apache.twill#twill-api;0.6.0-incubating in central
    found org.apache.twill#twill-discovery-api;0.6.0-incubating in central
    found org.apache.twill#twill-zookeeper;0.6.0-incubating in central
    found org.apache.twill#twill-discovery-core;0.6.0-incubating in central
    found org.ow2.asm#asm-all;5.0.2 in central
    found io.dropwizard.metrics#metrics-core;3.1.0 in central
    found org.antlr#antlr-runtime;3.5.2 in central
    found jline#jline;2.11 in central
    found sqlline#sqlline;1.2.0 in central
    found joda-time#joda-time;1.6 in central
    found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
    found junit#junit;4.12 in central
    found org.apache.httpcomponents#httpclient;4.0.1 in central
    found commons-logging#commons-logging;1.2 in central
    found org.iq80.snappy#snappy;0.3 in central
    found commons-collections#commons-collections;3.2.2 in central
    found org.apache.commons#commons-csv;1.0 in central
    found org.apache.hbase#hbase-annotations;1.1.3 in central
    found org.apache.hbase#hbase-protocol;1.1.3 in central
    found org.apache.hadoop#hadoop-common;2.7.1 in central
    found org.apache.hadoop#hadoop-annotations;2.7.1 in central
    found org.apache.commons#commons-math3;3.1.1 in central
    found xmlenc#xmlenc;0.52 in central
    found commons-net#commons-net;3.1 in central
    found javax.servlet#servlet-api;2.5 in central
    found com.sun.jersey#jersey-json;1.9 in central
    found org.codehaus.jettison#jettison;1.1 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
    found javax.xml.bind#jaxb-api;2.2.2 in central
    found javax.xml.stream#stax-api;1.0-2 in central
    found javax.activation#activation;1.1 in central
    found org.codehaus.jackson#jackson-xc;1.9.2 in central
    found net.java.dev.jets3t#jets3t;0.9.0 in central
    found org.apache.httpcomponents#httpcore;4.2.5 in central
    found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
    found commons-configuration#commons-configuration;1.6 in central
    found commons-digester#commons-digester;1.8 in central
    found commons-beanutils#commons-beanutils;1.7.0 in central
    found commons-beanutils#commons-beanutils-core;1.8.0 in central
    found org.apache.hadoop#hadoop-auth;2.7.1 in central
    found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
    found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
    found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
    found org.apache.directory.api#api-util;1.0.0-M20 in central
    found org.apache.curator#curator-framework;2.7.1 in central
    found org.apache.curator#curator-client;2.7.1 in central
    found com.jcraft#jsch;0.1.42 in central
    found org.apache.curator#curator-recipes;2.7.1 in central
    found org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 in central
    found org.apache.hadoop#hadoop-yarn-common;2.7.1 in central
    found org.apache.hadoop#hadoop-yarn-api;2.7.1 in central
    found com.sun.jersey#jersey-client;1.9 in central
    found com.google.inject.extensions#guice-servlet;3.0 in central
    found com.sun.jersey.contribs#jersey-guice;1.9 in central
    found org.slf4j#slf4j-log4j12;1.7.10 in central
    found io.netty#netty;3.6.2.Final in central
    found javax.servlet.jsp#jsp-api;2.1 in central
:: resolution report :: resolve 27998ms :: artifacts dl 2975ms
    :: modules in use:
    aopalliance#aopalliance;1.0 from central in [default]
    asm#asm;3.1 from central in [default]
    ch.qos.logback#logback-classic;1.0.9 from central in [default]
    ch.qos.logback#logback-core;1.0.9 from central in [default]
    com.databricks#spark-avro_2.11;4.0.0 from central in [default]
    com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 from central in [default]
    com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
    com.google.code.findbugs#jsr305;2.0.1 from central in [default]
    com.google.code.gson#gson;2.2.4 from central in [default]
    com.google.guava#guava;13.0.1 from central in [default]
    com.google.inject#guice;3.0 from central in [default]
    com.google.inject.extensions#guice-assistedinject;3.0 from central in [default]
    com.google.inject.extensions#guice-servlet;3.0 from central in [default]
    com.google.protobuf#protobuf-java;2.5.0 from central in [default]
    com.hortonworks#shc-core;1.1.1-2.1-s_2.11 from repo-1 in [default]
    com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
    com.jcraft#jsch;0.1.42 from central in [default]
    com.lmax#disruptor;3.3.0 from central in [default]
    com.sun.jersey#jersey-client;1.9 from central in [default]
    com.sun.jersey#jersey-core;1.9 from central in [default]
    com.sun.jersey#jersey-json;1.9 from central in [default]
    com.sun.jersey#jersey-server;1.9 from central in [default]
    com.sun.jersey.contribs#jersey-guice;1.9 from central in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
    com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
    com.yammer.metrics#metrics-core;2.2.0 from central in [default]
    commons-beanutils#commons-beanutils;1.7.0 from central in [default]
    commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
    commons-cli#commons-cli;1.2 from central in [default]
    commons-codec#commons-codec;1.9 from central in [default]
    commons-collections#commons-collections;3.2.2 from central in [default]
    commons-configuration#commons-configuration;1.6 from central in [default]
    commons-digester#commons-digester;1.8 from central in [default]
    commons-el#commons-el;1.0 from central in [default]
    commons-httpclient#commons-httpclient;3.1 from central in [default]
    commons-io#commons-io;2.4 from central in [default]
    commons-lang#commons-lang;2.6 from central in [default]
    commons-logging#commons-logging;1.2 from central in [default]
    commons-net#commons-net;3.1 from central in [default]
    io.dropwizard.metrics#metrics-core;3.1.0 from central in [default]
    io.netty#netty;3.6.2.Final from central in [default]
    io.netty#netty-all;4.0.23.Final from central in [default]
    it.unimi.dsi#fastutil;6.5.6 from central in [default]
    javax.activation#activation;1.1 from central in [default]
    javax.inject#javax.inject;1 from central in [default]
    javax.servlet#servlet-api;2.5 from central in [default]
    javax.servlet.jsp#jsp-api;2.1 from central in [default]
    javax.xml.bind#jaxb-api;2.2.2 from central in [default]
    javax.xml.stream#stax-api;1.0-2 from central in [default]
    jline#jline;2.11 from central in [default]
    joda-time#joda-time;1.6 from central in [default]
    junit#junit;4.12 from central in [default]
    log4j#log4j;1.2.17 from central in [default]
    net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
    org.antlr#antlr-runtime;3.5.2 from central in [default]
    org.apache.avro#avro;1.7.6 from central in [default]
    org.apache.commons#commons-compress;1.4.1 from central in [default]
    org.apache.commons#commons-csv;1.0 from central in [default]
    org.apache.commons#commons-math;2.2 from central in [default]
    org.apache.commons#commons-math3;3.1.1 from central in [default]
    org.apache.curator#curator-client;2.7.1 from central in [default]
    org.apache.curator#curator-framework;2.7.1 from central in [default]
    org.apache.curator#curator-recipes;2.7.1 from central in [default]
    org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
    org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
    org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
    org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
    org.apache.hadoop#hadoop-annotations;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-auth;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-common;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-api;2.7.1 from central in [default]
    org.apache.hadoop#hadoop-yarn-common;2.7.1 from central in [default]
    org.apache.hbase#hbase-annotations;1.1.3 from central in [default]
    org.apache.hbase#hbase-client;1.1.2 from central in [default]
    org.apache.hbase#hbase-common;1.1.2 from central in [default]
    org.apache.hbase#hbase-prefix-tree;1.1.2 from central in [default]
    org.apache.hbase#hbase-procedure;1.1.2 from central in [default]
    org.apache.hbase#hbase-protocol;1.1.3 from central in [default]
    org.apache.hbase#hbase-server;1.1.2 from central in [default]
    org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
    org.apache.httpcomponents#httpclient;4.0.1 from central in [default]
    org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
    org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 from central in [default]
    org.apache.tephra#tephra-api;0.9.0-incubating from central in [default]
    org.apache.tephra#tephra-core;0.9.0-incubating from central in [default]
    org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating from central in [default]
    org.apache.thrift#libthrift;0.9.0 from central in [default]
    org.apache.twill#twill-api;0.6.0-incubating from central in [default]
    org.apache.twill#twill-common;0.6.0-incubating from central in [default]
    org.apache.twill#twill-core;0.6.0-incubating from central in [default]
    org.apache.twill#twill-discovery-api;0.6.0-incubating from central in [default]
    org.apache.twill#twill-discovery-core;0.6.0-incubating from central in [default]
    org.apache.twill#twill-zookeeper;0.6.0-incubating from central in [default]
    org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-xc;1.9.2 from central in [default]
    org.codehaus.jettison#jettison;1.1 from central in [default]
    org.hamcrest#hamcrest-core;1.3 from central in [default]
    org.iq80.snappy#snappy;0.3 from central in [default]
    org.jamon#jamon-runtime;2.3.1 from central in [default]
    org.jruby.jcodings#jcodings;1.0.8 from central in [default]
    org.jruby.joni#joni;2.1.2 from central in [default]
    org.mortbay.jetty#jetty;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-sslengine;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
    org.mortbay.jetty#jsp-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#jsp-api-2.1;6.1.14 from central in [default]
    org.mortbay.jetty#servlet-api;2.5-20081211 from central in [default]
    org.mortbay.jetty#servlet-api-2.5;6.1.14 from central in [default]
    org.ow2.asm#asm-all;5.0.2 from central in [default]
    org.slf4j#slf4j-api;1.7.7 from central in [default]
    org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
    org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
    org.tukaani#xz;1.0 from central in [default]
    org.xerial.snappy#snappy-java;1.0.5 from central in [default]
    sqlline#sqlline;1.2.0 from central in [default]
    tomcat#jasper-compiler;5.5.23 from central in [default]
    tomcat#jasper-runtime;5.5.23 from central in [default]
    xmlenc#xmlenc;0.52 from central in [default]
    :: evicted modules:
    org.slf4j#slf4j-api;1.7.5 by [org.slf4j#slf4j-api;1.7.7] in [default]
    org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.7] in [default]
    org.apache.hbase#hbase-protocol;1.1.2 by [org.apache.hbase#hbase-protocol;1.1.3] in [default]
    org.apache.hbase#hbase-annotations;1.1.2 by [org.apache.hbase#hbase-annotations;1.1.3] in [default]
    junit#junit;4.11 by [junit#junit;4.12] in [default]
    com.google.guava#guava;12.0.1 by [com.google.guava#guava;13.0.1] in [default]
    com.google.code.findbugs#jsr305;1.3.9 by [com.google.code.findbugs#jsr305;2.0.1] in [default]
    org.slf4j#slf4j-log4j12;1.6.1 by [org.slf4j#slf4j-log4j12;1.7.10] in [default]
    commons-collections#commons-collections;3.2.1 by [commons-collections#commons-collections;3.2.2] in [default]
    commons-lang#commons-lang;2.5 by [commons-lang#commons-lang;2.6] in [default]
    org.apache.httpcomponents#httpclient;4.1.3 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
    org.apache.httpcomponents#httpcore;4.1.3 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
    org.apache.zookeeper#zookeeper;3.4.5 by [org.apache.zookeeper#zookeeper;3.4.6] in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.2 by [org.codehaus.jackson#jackson-core-asl;1.9.13] in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.2 by [org.codehaus.jackson#jackson-mapper-asl;1.9.13] in [default]
    org.apache.httpcomponents#httpcore;4.0.1 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
    commons-codec#commons-codec;1.7 by [commons-codec#commons-codec;1.9] in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.2 by [org.codehaus.jackson#jackson-jaxrs;1.9.13] in [default]
    org.apache.httpcomponents#httpclient;4.2.5 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
    org.apache.avro#avro;1.7.4 by [org.apache.avro#avro;1.7.6] in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |  142  |   9   |   9   |   20  ||  122  |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 122 already retrieved (0kB/387ms)
18/07/12 03:02:08 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 192.168.116.128 instead (on interface eth1)
18/07/12 03:02:08 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:>                                                          (0 + 1) / 1]18/07/12 03:04:37 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan;
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.org$apache$spark$sql$execution$datasources$hbase$HBaseTableScanRDD$$buildScan(HBaseTableScan.scala:223)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:280)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:279)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
    at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.compute(HBaseTableScan.scala:279)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

据我所知,问题是-Hortonworks使用HBase 1.1.2版本依赖性构建了shc-core,但是我使用的是Hbase 1.2.0。从中央Maven存储库为hbase 1.1.2加载的jar中可能没有一些类。请纠正我,不确定这个错误的根本原因。

我找到了此错误的解释:

java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan

此处: https://github.com/hortonworks-spark/shc/issues/154

我可以解决不构建本地代码的问题吗?有些人回答说重建并不能解决这个问题。还是通过PySpark从HBase读取其他方法?

请告知为什么从HBase读取时出现问题。以及如何避免呢?

1 个答案:

答案 0 :(得分:0)

此问题通常是由以下事实引起的:安装的版本与项目中使用的版本不同,或者依赖源中的差异。请检查项目中hbase的版本。