默认情况下,当我加载数据时,每列都被视为字符串类型。数据如下:
firstName,lastName,age,doj
dileep,gog,21,2016-01-01
avishek,ganguly,21,2016-01-02
shreyas,t,20,2016-01-03
更新RDD
的架构后,它看起来像
temp.printSchema
|-- firstName: string (nullable = true)
|-- lastName: string (nullable = true)
|-- age: string (nullable = true)
|-- doj: date (nullable = true)
注册一个临时表并查询它
temp.registerTempTable("temptable");
val temp1 = sqlContext.sql("select * from temptable")
temp1.show()
+---------+--------+---+----------+
|firstName|lastName|age| doj|
+---------+--------+---+----------+
| dileep| gog| 21|2016-01-01|
| avishek| ganguly| 21|2016-01-02|
| shreyas| t| 20|2016-01-03|
+---------+--------+---+----------+
val temp2 = sqlContext.sql("select * from temptable where doj > cast('2016-01-02' as date)")
但是当我试图看到它给我的结果时:
temp2: org.apache.spark.sql.DataFrame = [firstName: string, lastName: string, age: string, doj: date]
当我做的时候
temp2.show()
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
答案 0 :(得分:0)
所以我尝试了你的代码,它对我有用。我怀疑问题在于你最初如何更改架构,这对我来说(当你在评论中发布时,授予有点难以阅读 - 你应该用代码更新问题)。
无论如何,我这样做了:
首先模拟你的输入:
val df = sc.parallelize(List(("dileep","gog","21","2016-01-01"), ("avishek","ganguly","21","2016-01-02"), ("shreyas","t","20","2016-01-03"))).toDF("firstName", "lastName", "age", "doj")
然后:
import org.apache.spark.sql.functions._
val temp = df.withColumn("doj", to_date('doj))
temp.registerTempTable("temptable");
val temp2 = sqlContext.sql("select * from temptable where doj > cast('2016-01-02' as date)")
执行a temp2.show()
按预期显示:
+---------+--------+---+----------+
|firstName|lastName|age| doj|
+---------+--------+---+----------+
| shreyas| t| 20|2016-01-03|
+---------+--------+---+----------+