如何在数据集上使用get Instance函数并过滤基于数据类型的数据

时间:2019-06-27 07:56:53

标签: scala apache-spark

我想了解数据集上的get Instance函数。我有一些数据集,如果要在数据集中收到任何错误的类型,我想根据那里的数据类型(如int,String和Date)来映射每一列,那么我想过滤该行。

我有Input数据集,类型是(Int,String,String,Date)

import org.apache.spark.sql.types._
case class Test(ID:Int,AirName:String,Place:String,TakeoffDate:String)
val df= myFile.map(x => x.split(",") ).map( x=> Test(x(0).toInt,x(1),x(2),x(3)) ).toDF()

+-----+-------+-----+-----------+
|   ID|AirName|Place|TakeoffDate|
+-----+-------+-----+-----------+
|    1|  Delta|  Aus|    1/11/18|
|    2|  Delta|     |    10/5/19|
|Three|   null|  New| 15/10/2018|
|    4| JetAir|  Aus|    11/6/15|
+-----+-------+-----+-----------+
 After Creation of dataset Expected Output Dataset1
+-----+-------+-----+-----------+
|   ID|AirName|Place|TakeoffDate|
+-----+-------+-----+-----------+
|    1|  Delta|  Aus|    1/11/18|
|    2|  Delta|     |    10/5/19|
|    4| JetAir|  Aus|    11/6/15|
+-----+-------+-----+-----------+


Dataset2
+-----+-------+-----+-----------+
|   ID|AirName|Place|TakeoffDate|
+-----+-------+-----+-----------+
|Three|   null|  New| 15/10/2018|
|    4| JetAir|  Aus|    11/6/15|
+-----+-------+-----+-----------+

0 个答案:

没有答案