如果我想在Spark 2.2中重命名DataFrame的列并使用show()
打印其内容,我会收到以下错误:
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'cluster' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'project' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'client' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'twitter_mentioned_user' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'author' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 WARN ScalaRowValueReader: Field 'cluster' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/04 12:05:37 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 7)
scala.MatchError: Buffer(13145439) (of class scala.collection.convert.Wrappers$JListWrapper)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:379)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:61)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:58)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
Caused by: scala.MatchError: Buffer(13145439) (of class scala.collection.convert.Wrappers$JListWrapper)
我打印了架构,它看起来如下:
df_processed
.withColumn("srcId", toInt(df_processed("srcId")))
.withColumn("dstId", toInt(df_processed("dstId")))
.withColumn("attr", rand).printSchema()
输出:
root
|-- srcId: integer (nullable = true)
|-- dstId: integer (nullable = true)
|-- attr: double (nullable = false)
运行此代码时出错:
df_processed
.withColumn("srcId", toInt(df_processed("srcId")))
.withColumn("dstId", toInt(df_processed("dstId")))
.withColumn("attr", rand).show()
当我添加.withColumn("attr", rand)
时会发生这种情况,但是当我使用.withColumn("attr2", lit(0))
时它会起作用。
更新
df_processed.printSchema()
root
|-- srcId: double (nullable = true)
|-- dstId: double (nullable = true)
df_processed.show()
不会出错。
答案 0 :(得分:0)
以下是您尝试执行的类似示例,要转换数据类型,您可以使用cast
函数
val ds = Seq(
(1.2, 3.5),
(1.2, 3.5),
(1.2, 3.5)
).toDF("srcId", "dstId")
ds.withColumn("srcId", $"srcId".cast(IntegerType))
.withColumn("dstId", $"dstId".cast(IntegerType))
.withColumn("attr", rand)
希望这有帮助!
答案 1 :(得分:0)
您可以添加UDF函数:
input_layer3 = self.layer2.reshape(-1, 32*9*9)