为什么df编写控制台格式不显示任何内容?

时间:2020-04-26 19:05:07

标签: scala dataframe apache-spark spark-streaming

我有一个静态数据框,如何将其写入控制台而不是使用df.show()

val sparkConfig = new SparkConf().setAppName("streaming-vertica").setMaster("local[2]")
val sparkSession = SparkSession.builder().master("local[2]").config(sparkConfig).getOrCreate()
val sc = sparkSession.sparkContext

val rows = sc.parallelize(Array(
  Row(1,"hello", true),
  Row(2,"goodbye", false)
))

val schema = StructType(Array(
  StructField("id",IntegerType, false),
  StructField("sings",StringType,true),
  StructField("still_here",BooleanType,true)
))

val df = sparkSession.createDataFrame(rows, schema) 

df.write
  .format("console")
  .mode("append")

这没有将任何内容写入控制台:

 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/27 00:30:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Process finished with exit code 0

使用保存时:

   df.write
      .format("console")
      .mode("append")
      .save()

它给出了:

使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties 20/04/27 00:45:01 WARN NativeCodeLoader:无法在适用的平台上使用内置的Java类为您的平台加载本机Hadoop库。 线程“主”中的异常java.lang.RuntimeException:org.apache.spark.sql.execution.streaming.ConsoleSinkProvider不允许将表创建为select。 在scala.sys.package $ .error(package.scala:27) 在org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:473) 在org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult $ lzycompute(commands.scala:58) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:138) 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151) 在org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 在org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) 在org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:92) 在org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 在org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) 在org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) 在rep.StaticDFWrite $ .main(StaticDFWrite.scala:35) 在rep.StaticDFWrite.main(StaticDFWrite.scala)

火花版本= 2.2.1
Scala版本= 2.11.12

1 个答案:

答案 0 :(得分:0)

您必须在DataFrameWriter对象上调用save

没有保存方法,它将仅创建DataFrameWriter对象并终止您的会话。

检查以下代码,我已经检查了spark-shell。

请注意,此代码适用于Spark版本2.4.0,但不适用于2.2.0

控制台格式不适用于Spark 2.2.0-https://issues.apache.org/jira/browse/SPARK-20599

scala> df.write.format("console").mode("append")
res5: org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] = org.apache.spark.sql.DataFrameWriter@148a3112

scala> df.write.format("console").mode("append").save()
+--------+---+
|    name|age|
+--------+---+
|srinivas| 20|
+--------+---+