将Sql查询转换为spark

时间:2017-01-25 10:11:44

标签: scala apache-spark spark-dataframe

我有sql查询,我想将其转换为spark-scala

SELECT aid,DId,BM,BY 
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1;

SU是我的数据框架。

我这样做了
sqlContext.sql("""
  SELECT aid,DId,BM,BY 
  FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
  GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1
""")

而不是我在使用我的数据帧时需要这个

1 个答案:

答案 0 :(得分:1)

这应该是DataFrame的等价物:

SU.filter($"cd" === 2)
  .select("aid","DId","BM","BY","TO")
  .distinct()
  .groupBy("aid","DId","BM","BY")
  .count()
  .filter($"count" > 1)
  .select("aid","DId","BM","BY")