我在表中保存了默认的列名,并且希望将表中保存的列名与将在CSV文件中接收的列名匹配。
以下代码的结果是:
如果文件具有与表中存储的相同的列名以匹配,则进行一些处理,否则退出并抛出不匹配架构的电子邮件。
这是我的代码:
val expectedschemadf = spark.sql(s"""SELECT columnname FROM table""").columns
val receivedschemadf = spark.table(vendorfile.toString).columns
if(expectedschemadf.size == receivedschemadf.size)
{
breakable {for(i<-0 to expectedschemadf.size-1 by 1)
{
if (!(receivedschemadf contains expectedschemadf(i)))
{
print("fail")
break
}
}
}
}
else(print("fail"))
我想要的结果:
我想将上述for循环自动化到一些预定义的函数中。
答案 0 :(得分:0)
下面是检查两个数据框架构的示例代码
scala> val df1 = Seq((1,"a", 1.5)).toDF
df1: org.apache.spark.sql.DataFrame = [_1: int, _2: string ... 1 more field]
scala> df1.printSchema
root
|-- _1: integer (nullable = false)
|-- _2: string (nullable = true)
|-- _3: double (nullable = false)
scala> val df2 = Seq((100,"x", 1231)).toDF
df2: org.apache.spark.sql.DataFrame = [_1: int, _2: string ... 1 more field]
scala> df2.printSchema
root
|-- _1: integer (nullable = false)
|-- _2: string (nullable = true)
|-- _3: integer (nullable = false)
scala> df1.schema == df2.schema
res7: Boolean = false
scala> val df3 = Seq((100,"x", 123.1)).toDF
df3: org.apache.spark.sql.DataFrame = [_1: int, _2: string ... 1 more field]
scala> df3.printSchema
root
|-- _1: integer (nullable = false)
|-- _2: string (nullable = true)
|-- _3: double (nullable = false)
scala> df1.schema == df3.schema
res9: Boolean = true
答案 1 :(得分:0)
我没有在环境中运行此代码,但这通常是将列名放入seq和Seq的方式。如果序列的顺序和成员相同,则equals应返回true;如果序列的成员相同,则equals应返回false差异。
val tableSeq = Seq("name","address","zip") // simulating a seq that you can retrive from your table
val inputdf = spark.read.json("path") // reading some external data into dataframe
val columnListUnzipped = inputdf.dtypes.unzip // unzip will give tupple of column name and type
val columnList= columnListUnzipped._1 // get all column names as a seq
val isEqual= tableSeq.euqals(columnList) // compare 2 sequences with using equal as provided by Scala
答案 2 :(得分:0)
这是我完成任务的方式。
val expectedCol = dfMetaDataFileTracker.select("COLUMNNAME").collect().map(_.getString(0)).sorted.toList.map(_.toUpperCase())
val receivedCol = dfVendorFile.columns.sorted.toList.map(_.toUpperCase())
if ((expectedCol.length == receivedCol.length) && (expectedCol.equals(receivedCol)))
{
println("file schema matched with the expected schema!")
break
}
else {
println("file schema does not matched with the expected schema!")
break
}