我有一个静态数据框,如何将其写入控制台而不是使用df.show()
val sparkConfig = new SparkConf().setAppName("streaming-vertica").setMaster("local[2]")
val sparkSession = SparkSession.builder().master("local[2]").config(sparkConfig).getOrCreate()
val sc = sparkSession.sparkContext
val rows = sc.parallelize(Array(
Row(1,"hello", true),
Row(2,"goodbye", false)
))
val schema = StructType(Array(
StructField("id",IntegerType, false),
StructField("sings",StringType,true),
StructField("still_here",BooleanType,true)
))
val df = sparkSession.createDataFrame(rows, schema)
df.write
.format("console")
.mode("append")
这没有将任何内容写入控制台:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/27 00:30:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Process finished with exit code 0
使用保存时:
df.write
.format("console")
.mode("append")
.save()
它给出了:
使用Spark的默认log4j配置文件:org / apache / spark / log4j-defaults.properties 20/04/27 00:45:01 WARN NativeCodeLoader:无法在适用的平台上使用内置的Java类为您的平台加载本机Hadoop库。 线程“主”中的异常java.lang.RuntimeException:org.apache.spark.sql.execution.streaming.ConsoleSinkProvider不允许将表创建为select。 在scala.sys.package $ .error(package.scala:27) 在org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:473) 在org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:50) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult $ lzycompute(commands.scala:58) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) 在org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ execute $ 1.apply(SparkPlan.scala:117) 在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ executeQuery $ 1.apply(SparkPlan.scala:138) 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151) 在org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 在org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) 在org.apache.spark.sql.execution.QueryExecution.toRdd $ lzycompute(QueryExecution.scala:92) 在org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 在org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) 在org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) 在rep.StaticDFWrite $ .main(StaticDFWrite.scala:35) 在rep.StaticDFWrite.main(StaticDFWrite.scala)
火花版本= 2.2.1
Scala版本= 2.11.12
答案 0 :(得分:0)
您必须在DataFrameWriter对象上调用save
。
没有保存方法,它将仅创建DataFrameWriter对象并终止您的会话。
检查以下代码,我已经检查了spark-shell。
请注意,此代码适用于Spark版本2.4.0,但不适用于2.2.0
控制台格式不适用于Spark 2.2.0-https://issues.apache.org/jira/browse/SPARK-20599
scala> df.write.format("console").mode("append")
res5: org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] = org.apache.spark.sql.DataFrameWriter@148a3112
scala> df.write.format("console").mode("append").save()
+--------+---+
| name|age|
+--------+---+
|srinivas| 20|
+--------+---+