我想过滤目标表中的记录,其日期大于源表的min(日期)(两个表中都有公共ID)
val cm_record_rdd=hiveContext.sql("select t1.* from target t1 left outer join source t2 on t1.id=t2.id")
val min_date_rdd=hiveContext.sql("select min(date) as min_date from source");
val src_rdd = hiveContext.sql("select * from source");
如何使用target.date> = source.min_date过滤cm_record的记录?
我尝试了以下步骤:
src_rdd.filter(cm_record_rdd("start_dt") >= min(src_rdd("date")))
src_rdd.filter(cm_record_rdd("start_dt") >= min_date_rdd("min_date"))
没有任何效果
解决方案:
val min_date=hiveContext.sql("select min(date) as min_date from source").collect.head.get(0)
src_rdd.filter(delta_count("start_dt") >= min_date)