有效地计算hive或sql中的百分比

时间:2014-02-01 01:04:35

标签: sql hive

SELECT
      (CASE WHEN tag=FRAUD THEN 0
      ELSE 1 END) fraud_tag,
      COUNT(DISTINCT account_id) AS distinct_account_count
    FROM fraud_tags a
    GROUP BY
      (CASE WHEN c.name='riskclass_NotFraud' THEN 0
      ELSE 1 END)
RESULT
fraud_tag   distinct_account_count
    0            100
    1            500

现在我想计算欺诈次数,欺诈次数为5,000的不同帐户数量超过帐户总数。我必须做两步。有什么建议可以提高效率吗?

1 个答案:

答案 0 :(得分:0)

最简单的方法是使用一行中的值来执行此操作:

SELECT COUNT(DISTINCT case when tag = FRAUD then account_id end) as distinct_fraud,
       COUNT(DISTINCT case when tag = FRAUD then NULL else account_id end) as distinct_notfraud,
       (COUNT(DISTINCT case when tag = FRAUD then account_id end)*1.0/count(distinct account_id)
       ) as fraud_rate
    FROM fraud_tags ft;