无法使用pyspark

时间:2019-07-17 02:23:54

标签: pyspark apache-spark-sql hbase

我想从 pyspark 将数据插入 Hbase 。下面是我实现的代码,当我尝试将数据写入Hbase时得到NullpointerException

  

pyspark-主纱线-部署模式客户端--jars   hbase-spark-2.1.0-cdh6.1.0.jar-驱动程序类路径   hbase-spark-2.1.0-cdh6.1.0.jar

代码:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.hadoop.hbase.spark'

df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1'])

catalog = ''.join("""{
"table":{"namespace":"default", "name":"testtable"},
"rowkey":"key",
"columns":{
    "col0":{"cf":"rowkey", "col":"key", "type":"string"},
    "col1":{"cf":"cf", "col":"col1", "type":"string"}
}
}""".split())
df.write.options(catalog=catalog,newtable=5).format(data_source_format).save()
  

错误:

     

df.write.options(catalog = catalog,newtable = 5).format(data_source_format).save()       追溯(最近一次通话):       文件“”,第1行,位于       文件“ /app/hadoop/parcels/CDH-6.1.0-       保存中的1.cdh6.1.0.p0.770702 / lib / spark / python / pyspark / sql / readwriter.py“,第734行       self._jwrite.save()       文件“ /app/hadoop/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,   第1257行,在致电中       文件“ /app/hadoop/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/python/pyspark/sql/utils.py”,   第63行,在装饰中       返回f(* a,** kw)       文件“ /app/hadoop/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py”,   第328行,位于get_return_value中       py4j.protocol.Py4JJavaError:调用o107.save时发生错误。       :java.lang.NullPointerException           在org.apache.hadoop.hbase.spark.HBaseRelation。(DefaultSource.scala:139)           在org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:79)           在org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)           在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult $ lzycompute(commands.scala:70)           在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)           在org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)           在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:131)           在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:127)           在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:155)           在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)           在org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)           在org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)           在org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:80)           在org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)           在org.apache.spark.sql.DataFrameWriter $$ anonfun $ runCommand $ 1.apply(DataFrameWriter.scala:668)           在org.apache.spark.sql.DataFrameWriter $$ anonfun $ runCommand $ 1.apply(DataFrameWriter.scala:668)           在org.apache.spark.sql.execution.SQLExecution $$ anonfun $ withNewExecutionId $ 1.apply(SQLExecution.scala:78)           在org.apache.spark.sql.execution.SQLExecution $ .withSQLConfPropagated(SQLExecution.scala:125)           在org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:73)           在org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)           在org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)           在org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)           在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)处           在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)           在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           在java.lang.reflect.Method.invoke(Method.java:498)           在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)           在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)           在py4j.Gateway.invoke(Gateway.java:282)           在py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)           在py4j.commands.CallCommand.execute(CallCommand.java:79)           在py4j.GatewayConnection.run(GatewayConnection.java:238)           在java.lang.Thread.run(Thread.java:748)

谢谢。

0 个答案:

没有答案