非常感谢您的帮助。我发现的最接近的错误报告是https://issues.apache.org/jira/browse/SPARK-7837。还有其他人看过这个问题吗?如果您在下面的堆栈跟踪中看到错误并知道错误,请告诉我。
当我调用df.repartition(1)时,.saveAsParquetFile()或df.saveAsParquetFile()无法在镶木地板文件中保存行数据,并参见下面的stacktrace:
Name: org.apache.spark.SparkException
Message: Job aborted.
StackTrace: org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:166)
org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:139)
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:336)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1508)
$line46.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
$line46.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
$line46.$read$$iwC$$iwC$$iwC.<init>(<console>:29)
$line46.$read$$iwC$$iwC.<init>(<console>:31)
$line46.$read$$iwC.<init>(<console>:33)
$line46.$read.<init>(<console>:35)
$line46.$read$.<init>(<console>:39)
$line46.$read$.<clinit>(<console>)
java.lang.J9VMInternals.initializeImpl(Native Method)
java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
$line46.$eval$.<init>(<console>:7)
$line46.$eval$.<clinit>(<console>)
java.lang.J9VMInternals.initializeImpl(Native Method)
java.lang.J9VMInternals.initialize(J9VMInternals.java:235)
$line46.$eval.$print(<console>)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
java.lang.reflect.Method.invoke(Method.java:620)
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$ apply$3.apply(ScalaInterpreter.scala:296)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$ apply$3.apply(ScalaInterpreter.scala:291)
com.ibm.spark.global.StreamState$.withStreams(StreamState.scala:80)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:290)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:290)
com.ibm.spark.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:123)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
java.lang.Thread.run(Thread.java:801)
答案 0 :(得分:0)
此问题已在1.6.0中解决。 IBM Bluemix Service现在拥有Spark 1.6.0。 请创建新服务并在此新实例中运行相同代码的笔记本,以解决问题。
谢谢, 查尔斯。