使用java.lang.NoSuchMethodError火花读取HBase:org.apache.hadoop.mapreduce.InputSplit.getLocationInfo错误

时间:2018-11-02 16:17:26

标签: scala apache-spark hadoop hbase

我想使用Scala通过Spark读取Hbase,但出现错误:

Exception in thread "dag-scheduler-event-loop" java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.InputSplit.getLocationInfo()[Lorg/apache/hadoop/mapred/SplitLocationInfo;

但是我已经添加了依赖性,这个问题困扰着我。我的环境如下:

  • 斯卡拉:2.11.12
  • 火花:2.3.1
  • HBase:也许是2.1.0(我不知道)
  • Hadoop:2.7.2.4

我的build.sbt是:

libraryDependencies ++= Seq(
    "org.apache.spark" % "spark-core_2.11" % "2.3.1",
    "org.apache.spark" % "spark-sql_2.11" % "2.3.1",
    "org.apache.spark" % "spark-streaming_2.11" % "2.3.1",
    "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.3.1",
    "org.apache.spark" % "spark-yarn_2.11" % "2.3.1",
    "org.apache.hadoop" % "hadoop-core" % "2.6.0-mr1-cdh5.15.1",
    "org.apache.hadoop" % "hadoop-common" % "2.7.2",
    "org.apache.hadoop" % "hadoop-client" % "2.7.2",
    "org.apache.hadoop" % "hadoop-mapred" % "0.22.0",
    "org.apache.hadoop" % "hadoop-nfs" % "2.7.2",
    "org.apache.hadoop" % "hadoop-hdfs" % "2.7.2",
    "org.apache.hadoop" % "hadoop-hdfs-nfs" % "2.7.2",
    "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.7.2",
    "org.apache.hadoop" % "hadoop-mapreduce" % "2.7.2",
    "org.apache.hadoop" % "hadoop-mapreduce-client" % "2.7.2",
    "org.apache.hadoop" % "hadoop-mapreduce-client-common" % "2.7.2",
    "org.apache.hbase" % "hbase" % "2.1.0",
    "org.apache.hbase" % "hbase-server" % "2.1.0",
    "org.apache.hbase" % "hbase-common" % "2.1.0",
    "org.apache.hbase" % "hbase-client" % "2.1.0",
    "org.apache.hbase" % "hbase-protocol" % "2.1.0",
    "org.apache.hbase" % "hbase-metrics" % "2.1.0",
    "org.apache.hbase" % "hbase-metrics-api" % "2.1.0",
    "org.apache.hbase" % "hbase-mapreduce" % "2.1.0",
    "org.apache.hbase" % "hbase-zookeeper" % "2.1.0",
    "org.apache.hbase" % "hbase-hadoop-compat" % "2.1.0",
    "org.apache.hbase" % "hbase-hadoop2-compat" % "2.1.0",
    "org.apache.hbase" % "hbase-spark" % "2.1.0-cdh6.1.0"
)

我真的不知道哪里错了,如果我添加了错误的依赖项或者需要添加一些新的依赖项,请告诉我在哪里可以下载它,例如:resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"

请帮助我,谢谢!

2 个答案:

答案 0 :(得分:1)

我可以获取有关您如何运行火花作业的更多详细信息吗? 如果您正在使用自定义发行版(例如Cloudera或Horton works),则可能必须使用它们的库进行编译,并且 spark-submit 将使用发行版安装的类路径将作业提交到集群。

要开始使用,请将% provided添加到 sbt 文件中的库中,以便它将使用spark安装类路径中的特定库。

答案 1 :(得分:1)

您需要修复这些版本以匹配您正在运行的Hadoop版本,否则可能会遇到类路径/方法问题。具体来说,您的错误来自mapreduce包

"org.apache.hadoop" % "hadoop-core" % "2.6.0-mr1-cdh5.15.1",
"org.apache.hadoop" % "hadoop-mapred" % "0.22.0",

Spark 已经包含大多数Hadoop本身,因此尚不清楚为什么要自己指定它们,但至少要在其中一些上使用% "provided"

对于hbase-spark,我怀疑您是否需要cdh6依赖性,因为CDH 6是基于Hadoop 3库而不是2.7.2的

相关问题