我有一张桌子:日志
+----+---------------------+---------------+
| ID | Time | Status |
+----+---------------------+---------------+
| 1 | 2016-07-19 03:20:12 | 200 OK |
| 2 | 2016-07-20 05:20:12 | 404 NOT FOUND |
| 3 | 2016-07-19 00:00:00 | 200 OK |
| 4 | 2016-07-20 10:20:12 | 404 NOT FOUND |
| 5 | 2016-08-05 07:00:02 | 404 NOT FOUND |
+----+---------------------+---------------+
我需要按日期顺序按“404 NOT FOUND”状态组合并数据总数百分比。 (波纹管)
理想的结果
+---------------------+---------+
| Date | Errors |
+---------------------+---------+
| 2016-07-20 00:00:00 | 0.66666 |
| 2016-08-05 00:00:00 | 0.33333 |
+---------------------+---------+
我无法弄清楚如何从一个查询中实现这一目标。到那时,我开始使用这个查询:
SELECT date_trunc('day',time) as "date", count(time) as errors
FROM log
WHERE status = '404 NOT FOUND'
GROUP BY date
ORDER BY errors DESC;
此查询结果为:
+---------------------+--------+
| Date | Errors |
+---------------------+--------+
| 2016-07-20 00:00:00 | 2 |
| 2016-08-05 00:00:00 | 1 |
+---------------------+--------+
获得理想结果的想法或参考?
答案 0 :(得分:3)
要获得所需的输出,请尝试以下查询:
SELECT date_trunc('day',time) as "date", round((
count(*)::decimal/(
select count(*) from log WHERE status = '404 NOT FOUND')
),2) as errors
FROM log
WHERE status = '404 NOT FOUND'
GROUP BY date
ORDER BY errors DESC;
这将显示:
date errors
2016-07-20T00:00:00.000Z 0.67
2016-08-05T00:00:00.000Z 0.33
这是一个有效的Fiddle
不要担心日期格式,在我的架构中我选择了timestamp
类型
答案 1 :(得分:1)
我认为窗口功能是最优雅的答案:
SELECT DISTINCT
EXTRACT(day FROM time) AS date,
CAST(
count(*) OVER (PARTITION BY CAST(time AS date))
AS double precision
) / count(*) OVER () as errors
FROM log
WHERE status = '404 NOT FOUND'
ORDER BY errors DESC;
答案 2 :(得分:0)
试试这个答案。
希望这可以帮助你:
./spark-shell --jars ~/spark/spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.10/spark-cassandra-connector-assembly-2.0.5-121-g1a7fa1f8.jar
import com.datastax.spark.connector._
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val test = sc.cassandraTable("sensorkeyspace", "sensortable")
test.count
答案 3 :(得分:0)
我希望这会有所帮助。我用过postgresql9.3
select *
from (
with total as (select sum(1) as tot FROM log WHERE status = '404 NOT FOUND')
SELECT date_trunc('day',time) as "date",
cast(SUM(CASE WHEN status = '404 NOT FOUND' THEN 1 ELSE 0 END) as decimal) / total.tot as percentage
FROM log , total
group by date, total.tot
) t
where percentage > 0
ORDER BY percentage desc;
答案 4 :(得分:-1)
你为什么不尝试这个:
select
date_trunc('day',time) as "date",
count(time)/(select count(*) from log where status='404 NOT FOUND') as errors
from log
where status = '404 NOT FOUND'
GROUP BY date_trunc('day',time);