从Impala查询和从Hive查询之间的区别?

时间:2019-07-12 07:43:12

标签: sql scala apache-spark dataframe hive

我有一个Hive源表,其中包含:


    private SeekBar seekbar,seekbar1;
    private TextView a,b;

    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.schiassess);
        seekbar = (SeekBar) findViewById(R.id.delubar);
        seekbar1 = (SeekBar) findViewById(R.id.disorbar);
        a =  findViewById(R.id.a);
        b =  findViewById(R.id.b);


        seekbar.setOnSeekBarChangeListener(new SeekBar.OnSeekBarChangeListener() {

            @Override
            public void onProgressChanged(SeekBar seekBar, int progress, boolean fromUser) {
                int val = (progress * (seekBar.getWidth() - 2 * seekBar.getThumbOffset())) / seekBar.getMax();
                progress++;
                a.setText("" + progress);
                a.setX(seekBar.getX() + val + seekBar.getThumbOffset() / 2);
            }
            @Override
            public void onStopTrackingTouch(SeekBar seekBar) {
            }
            public void onStartTrackingTouch(SeekBar seekBar) {
                // TODO Auto-generated method stub
            }
        });

        seekbar1.setOnSeekBarChangeListener(new SeekBar.OnSeekBarChangeListener() {
            @Override
            public void onProgressChanged(SeekBar seekBar, int progress, boolean fromUser) {
                int val = (progress * (seekBar.getWidth() - 2 * seekBar.getThumbOffset())) / seekBar.getMax();
                progress++;
                b.setText("" + progress);
                b.setX(seekBar.getX() + val + seekBar.getThumbOffset() / 2);
            }
            @Override
            public void onStopTrackingTouch(SeekBar seekBar) {
            }
            public void onStartTrackingTouch(SeekBar seekBar) {
                // TODO Auto-generated method stub
            }
        });

    }

我正在尝试获取所有表行,并使用select count(*) from dev_lkr_send.pz_send_param_ano; --25283 lines 将它们放入数据框。我做了以下事情:

Spark2-Scala

当我执行val dfMet = spark.sql(s"""SELECT CD_ANOMALIE, CD_FAMILLE, libelle AS LIB_ANOMALIE, to_date(substr(MAJ_DATE, 1, 19), 'YYYY-MM-DD HH24:MI:SS') AS DT_MAJ, CLASSIFICATION, NB_REJEUX, case when indic_cd_erreur = 'O' then 1 else 0 end AS TOP_INDIC_CD_ERREUR, case when invalidation_coordonnee = 'O' then 1 else 0 end AS TOP_COORDONNEE_INVALIDE, case when typ_mvt = 'S' then 1 else 0 end AS TOP_SUPP, case when typ_mvt = 'S' then to_date(substr(dt_capt, 1, 19), 'YYYY-MM-DD HH24:MI:SS') else null end AS DT_SUPP FROM ${use_database}.pz_send_param_ano""") 时,它返回:dfMet.count()

关于差异来源的任何想法吗?


EDIT1:

尝试从Hive进行相同的查询将返回与数据框中相同的值(我之前是从Impala UI进行查询的。)

有人可以解释差异吗?我正在开发Hue4。

1 个答案:

答案 0 :(得分:0)

一个潜在的差异源是您的Hive查询正在从元存储中返回过期的结果,而不是对表进行新的计数。

如果您将hive.compute.query.using.stats设置为true并且该表已计算了统计信息,则它将从元存储返回结果。如果是这种情况,则可能是您的统计信息过时了,您需要重新计算它们。