应用错误收集

为什么使用此sql脚本调用df.count（）和df.rdd.count（）时会有两个不同的值

时间：2018-09-20 02:57:52

标签： apache-spark hive apache-spark-sql

使用hiveContext.sql执行以下脚本：

我得到了这个结果。

with nt as (
    select label, score from (
        select * from (select label, score, row_number() over (order by score desc) as position from t1)t_1  
        join 
        (select count(*) as countall from t1)t_2 
    )ta 
    where position <= countall * 0.4
)

screenshot of this sql execution

在rdd和数据框上调用'count（）'函数时很奇怪，

如图所示：这里的输出不同。...

有人可以帮我吗？非常感谢！！！

0 个答案:

没有答案