我有sql查询,我想将其转换为spark-scala
SELECT aid,DId,BM,BY
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1;
SU是我的数据框架。
我这样做了sqlContext.sql("""
SELECT aid,DId,BM,BY
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1
""")
而不是我在使用我的数据帧时需要这个
答案 0 :(得分:1)
这应该是DataFrame的等价物:
SU.filter($"cd" === 2)
.select("aid","DId","BM","BY","TO")
.distinct()
.groupBy("aid","DId","BM","BY")
.count()
.filter($"count" > 1)
.select("aid","DId","BM","BY")