Spark Inner Join并获得Min()

时间:2017-01-27 11:13:08

标签: scala apache-spark

我无法正常加入并获得结果列,并且需要在加入

后获得列的min()
 SELECT 
t.ad,
t.DId,
t.BY,
t.BM,
t.cid,
MIN(p.PS) AS PS 
FROM 
    Tempity t inner join  ples p 
    on t.cid = p.cid
    and p.PType = t.TeO 
    AND p.pto = 'cccc' 
    AND p.cid = 2
  GROUP BY t.aid
    ,t.DId
    ,t.BYear
    ,t.BM
    ,t.cid;
I am converting above sql query as
        val RS = Tempity.join(DF_LES,Tempity("cid") <=> DF_PLES("cid")&&   DF_PLES("clientid") <=> 2 && Tempity("TO") <=> DF_LES("PType") && DF_LES("pto") <=> "cccc" ,"inner").select("aid","DId","BM","BY","TO","cid").groupBy(aid","DId","BM","BY")select("aid","DId","BM","BY","TO","cid").show

无法找出我做错的地方 错误

org.apache.spark.sql.AnalysisException: Reference 'cid' is ambiguous, could be: cid#4058, cid#13063L.;

1 个答案:

答案 0 :(得分:0)

使用 if (getNavigationBarSize(getContext()).x > 0 && getNavigationBarSize(getContext()).y > 0) { this.postDelayed(new Runnable() { @Override public void run() { setPadding(getPaddingLeft(), getPaddingTop(), getPaddingRight(), getNavigationBarSize(getContext()).y); } }, 100); } 代替Tempity("cid"),因为它不明确

cid

另一种方法是您可以在import org.apache.spark.sql.functions._ //for min() val RS = Tempity.join(DF_LES, Tempity("cid") <=> DF_PLES("cid") && DF_PLES("clientid") <=> 2 && Tempity("TO") <=> DF_PLES("PType") && DF_PLES("pto") <=> "cccc", "inner" ) .groupBy("a​id","DId","BM","BY", Tempity("cid"))‌​ .agg(min(DF_PLES("PS"))) RS.show()

上使用相同的SQL
SparkSession