从子查询计算百分比的SQL查询

时间:2018-03-09 14:11:34

标签: sql postgresql subquery aggregate-functions

我有一张桌子:日志

+----+---------------------+---------------+
| ID |        Time         |    Status     |
+----+---------------------+---------------+
|  1 | 2016-07-19 03:20:12 | 200 OK        |
|  2 | 2016-07-20 05:20:12 | 404 NOT FOUND |
|  3 | 2016-07-19 00:00:00 | 200 OK        |
|  4 | 2016-07-20 10:20:12 | 404 NOT FOUND |
|  5 | 2016-08-05 07:00:02 | 404 NOT FOUND |
+----+---------------------+---------------+

我需要按日期顺序按“404 NOT FOUND”状态组合并数据总数百分比。 (波纹管)

理想的结果

+---------------------+---------+
|        Date         | Errors  |
+---------------------+---------+
| 2016-07-20 00:00:00 | 0.66666 |
| 2016-08-05 00:00:00 | 0.33333 |
+---------------------+---------+

我无法弄清楚如何从一个查询中实现这一目标。到那时,我开始使用这个查询:

SELECT date_trunc('day',time) as "date", count(time) as errors
FROM log
WHERE status = '404 NOT FOUND'
GROUP BY date
ORDER BY errors DESC;

此查询结果为:

+---------------------+--------+
|        Date         | Errors |
+---------------------+--------+
| 2016-07-20 00:00:00 |      2 |
| 2016-08-05 00:00:00 |      1 |
+---------------------+--------+

获得理想结果的想法或参考?

5 个答案:

答案 0 :(得分:3)

要获得所需的输出,请尝试以下查询:

SELECT date_trunc('day',time) as "date", round((
  count(*)::decimal/(
            select count(*) from log WHERE status = '404 NOT FOUND')
),2) as errors
FROM log
WHERE status = '404 NOT FOUND'
GROUP BY date
ORDER BY errors DESC;

这将显示:

date                       errors
2016-07-20T00:00:00.000Z    0.67
2016-08-05T00:00:00.000Z    0.33

这是一个有效的Fiddle

不要担心日期格式,在我的架构中我选择了timestamp类型

答案 1 :(得分:1)

我认为窗口功能是最优雅的答案:

SELECT DISTINCT
   EXTRACT(day FROM time) AS date,
   CAST(
      count(*) OVER (PARTITION BY CAST(time AS date))
      AS double precision
   ) / count(*) OVER () as errors
FROM log                                                                            
WHERE status = '404 NOT FOUND'
ORDER BY errors DESC;

答案 2 :(得分:0)

试试这个答案。

希望这可以帮助你:

./spark-shell --jars ~/spark/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.10/spark-cassandra-connector-assembly-2.0.5-121-g1a7fa1f8.jar
import com.datastax.spark.connector._

 val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val test = sc.cassandraTable("sensorkeyspace", "sensortable")
test.count

答案 3 :(得分:0)

我希望这会有所帮助。我用过postgresql9.3

select * 
from (
  with total as (select sum(1) as tot FROM log WHERE status = '404 NOT FOUND')
  SELECT date_trunc('day',time) as "date",
  cast(SUM(CASE WHEN status = '404 NOT FOUND' THEN 1 ELSE 0 END) as decimal) / total.tot as percentage
  FROM log , total
  group by date, total.tot
    ) t
where percentage > 0
ORDER BY percentage desc;

答案 4 :(得分:-1)

你为什么不尝试这个:

select 
  date_trunc('day',time) as "date", 
  count(time)/(select count(*) from log where status='404 NOT FOUND') as errors
from log
where status = '404 NOT FOUND'
GROUP BY date_trunc('day',time);