SparkSQL / DataFrames在spark-shell和应用程序中不起作用

时间:2016-08-10 06:14:57

标签: apache-spark dataframe hbase apache-spark-sql

当我尝试按照HBase Guide中的示例时,它不起作用 。它出现以下错误:java.lang.NullPointerException,整个信息如下。我猜一个对象为null,然后通过null对象调用该方法,但我不知道哪个objec为null。谁能帮助我,非常感谢。

<小时/> 感谢大家,问题现在解决了,我在使用Intellij IDE的应用程序中调试了这段代码,我发现HBaseContext没有实例化,所以当我创建一个HBaseContext对象时它运行良好。就像这样:val hbaseContext = new HBaseContext(sc, config, null)

参考:https://hbase.apache.org/book.html#_basic_spark

所有Spark和HBase集成的根源是HBaseContext.HBaseContext接受HBase配置并将它们推送到Spark执行器。这允许我们在静态位置为每个Spark Executor建立一个HBase连接。

hadoop@master:~$ spark-1.6.0-bin-hadoop2.4/bin/spark-shell --jars /home/yang/Downloads/hbase-spark-2.0.0-20160316.173537-2.jar 
...
SQL context available as sqlContext.
scala> def catalog = s"""{
     |        |"table":{"namespace":"default", "name":"table1"},
     |        |"rowkey":"key",
     |        |"columns":{
     |          |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
     |          |"col1":{"cf":"cf1", "col":"col1", "type":"string"}
     |        |}
     |      |}""".stripMargin
catalog: String

scala> case class HBaseRecord(
     |    col0: String,
     |    col1: String)
defined class HBaseRecord

scala> val data = (0 to 255).map { i =>  HBaseRecord(i.toString, "extra")}
data: scala.collection.immutable.IndexedSeq[HBaseRecord] = Vector(HBaseRecord(0,extra), HBaseRecord(1,extra), HBaseRecord(2,extra), HBaseRecord(3,extra), HBaseRecord(4,extra), HBaseRecord(5,extra), HBaseRecord(6,extra), HBaseRecord(7,extra), HBaseRecord(8,extra), HBaseRecord(9,extra), HBaseRecord(10,extra), HBaseRecord(11,extra), HBaseRecord(12,extra), HBaseRecord(13,extra), HBaseRecord(14,extra), HBaseRecord(15,extra), HBaseRecord(16,extra), HBaseRecord(17,extra), HBaseRecord(18,extra), HBaseRecord(19,extra), HBaseRecord(20,extra), HBaseRecord(21,extra), HBaseRecord(22,extra), HBaseRecord(23,extra), HBaseRecord(24,extra), HBaseRecord(25,extra), HBaseRecord(26,extra), HBaseRecord(27,extra), HBaseRecord(28,extra), HBaseRecord(29,extra), HBaseRecord(30,extra), HBaseRecord(31,extra), HBase...
scala> import org.apache.spark.sql.datasources.hbase
import org.apache.spark.sql.datasources.hbase

scala> sc.parallelize(data).toDF.write.options(Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")).format("org.apache.hadoop.hbase.spark").save()
<console>:35: error: not found: value HBaseTableCatalog
              sc.parallelize(data).toDF.write.options(Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")).format("org.apache.hadoop.hbase.spark").save()
                                                          ^ 

scala> import org.apache.spark.sql.datasources.hbase.{HBaseTableCatalog}
import org.apache.spark.sql.datasources.hbase.HBaseTableCatalog

scala> sc.parallelize(data).toDF.write.options(Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")).format("org.apache.hadoop.hbase.spark").save()
java.lang.NullPointerException
    at org.apache.hadoop.hbase.spark.HBaseRelation.<init>(DefaultSource.scala:125)
    at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:74)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
    at $iwC$$iwC$$iwC.<init>(<console>:49)
    at $iwC$$iwC.<init>(<console>:51)
    at $iwC.<init>(<console>:53)
    at <init>(<console>:55)
    at .<init>(<console>:59)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


scala> 
顺便提一下,以下是我环境的信息:

  • Spark:版本1.6.0
  • Scala:版本2.10.5(OpenJDK 64位服务器VM,Java 1.8.0_91)
  • Hbase:版本1.2.2
  • Hadoop:版本2.4.0

0 个答案:

没有答案