使用Spark 1.5.1,Hive 1.2.1
当我在spark-shell --master yarn --deploy-mode client
下运行此代码段时:
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
var queryLeft = "SELECT t1.* FROM (SELECT t2.*, row_number() over (PARTITION BY CAST(TRIM(t2.pk) as DECIMAL(31,8)) ORDER BY t2.create_dt DESC) AS R FROM myschema.mytable t2 WHERE t2.part_dt='mydate' AND t2.part_seq='myseq') t1 WHERE t1.R = 1"
val dfLeft = hiveContext.sql(queryLeft)
val firstCount = dfLeft.count
val secondCount = dfLeft.count
我得到这个结果,这两个都是错误的(并且不相等!!)
scala> print (firstCount, secondCount)
(1865,2373)
当我在spark-shell
下运行相同的代码段时,我得到了正确的结果
scala> print (firstCount, secondCount)
(2395,2395)
我有什么不对的吗?