Spark 1.4.1 - 使用pyspark

时间:2015-09-04 05:34:14

标签: apache-spark-sql

我尝试使用此命令,我收到错误

代码

 instances = sqlContext.sql("SELECT instance_id ,instance_usage_code 
 FROM ib_instances WHERE (instance_usage_code) = 'OUT_OF_ENTERPRISE' ")

 instances.write.format("orc").save("instances2")

 hivectx.sql(""" CREATE TABLE IF NOT EXISTS instances2 (instance_id 
 string, instance_usage_code STRING)""" )

 hivectx.sql (" LOAD DATA LOCAL INPATH '/home/hduser/instances2' into 
 table instances2 ")

错误

  

Traceback(最近一次调用最后一次):文件   " /home/hduser/spark_script.py" ;,第57行,in   instances.write.format(" orc")。save(" instances2")文件   " /usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/s   ql / readwriter.py",第304行,在保存文件中   " /usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/   py4j / java_gateway.py",第538行,在调用文件中   " /usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/   py4j / protocol.py",第300行,在get_return_value中   py4j.protocol.Py4JJavaError:调用o55.save时发生错误。   :java.lang.AssertionError:断言失败:ORC数据源可以   仅用于HiveContext。在   scala.Predef $ .assert(Predef.scala:179)at   org.apache.spark.sql.hive.orc.DefaultSource.createRelation(OrcRelation   .scala:54)at   org.apache.spark.sql.sources.ResolvedDataSource $。适用(ddl.scala:322)   在   org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)   在   org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j   ava:57)at   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess   orImpl.java:43)在java.lang.reflect.Method.invoke(Method.java:606)at at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)at   py4j.Gateway.invoke(Gateway.java:259)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)   在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:207)at   java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

我的猜测是你创建了一个标准的SQLContext,而不是一个Hive(增加了几个选项)。创建sqlContext作为HiveContext实例。 scala版本是:

val sqlContext = new HiveContext(sc)