Question

我有这两个Scala序列，我需要检查它们是否相等， 忽略可为空的列 。

val schemaA = StructType(Seq(StructField("date",DateType,true), StructField("account_name",StringType,true)))

val df_A = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaA)

val schemaB = StructType(Seq(StructField("date",DateType,false), StructField("account_name",StringType,true)))

val df_B = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schemaB)

在python中，我可以简单地做到这一点：

 print(
     all(        
         for a,b in zip(df_A.schema, df_B.schema)
           (a.name, a.dataType) == (b.name, b.dataType)
     )
 )

但是我被困在Scala中做同样的事情，有什么提示吗？

Answer 1

非常类似于您的Python解决方案：

val result: Boolean = schemaA.zip(schemaB).forall {
  case (a, b) => (a.name, a.dataType) == (b.name, b.dataType)
}

（无需使用DF）。

请注意，当其中一个模式具有其他人没有的额外字段时，此解决方案和python都可能返回true，因为zip只会忽略它们。

Answer 2

解决注释中提到的“额外的列”问题的另一种方法：

val result = schemaA.map { a => a.name -> a.type } == schemaB.map { b => b.name -> b.type }

Scala同时循环2个序列

2 个答案: