在SQL Select语句中计算百分比

时间:2018-03-13 11:46:54

标签: sql postgresql subquery aggregate-functions

我是SQL的新手,我正在尝试创建一个查询,该查询将显示超过1%的网站请求导致错误的日期。

示例:

  • 2016年7月29日 - 2.5%错误

我使用的是单table called Log,而且错误是' 404'状态列中保存的HTTP错误。根据图像,表格标题为:     记录(id,时间,状态,方法,ip,路径)

我目前拼凑了以下查询。我使用子查询列出了有一个或多个错误的所有日期。在主查询中,我尝试使用当天的日志状态总和来计算每天的错误百分比

SELECT date(log.time), count(log.status) * 100 / subquery1.error_instance AS error FROM
  (SELECT date(log.time), count(log.status) AS error_instance
     FROM log
     WHERE status='404 NOT FOUND'
     GROUP BY log.time
     ORDER BY error_instance desc) subquery1
  JOIN log
     ON date(log.time) = subquery1.date
GROUP BY log.time
ORDER BY error

我一直遇到以下错误

Column "subquery1.error_instance" must appear in the GROUP BY clause or be used in an aggregate function

编辑:我已将初始FROM添加到我的示例代码块中。尽管从最初的帖子中遗漏了它,它实际上存在于我的查询代码中,所以这不是问题。

6 个答案:

答案 0 :(得分:0)

您不需要JOIN。您可以使用窗口函数:

SELECT l.* 
FROM (SELECT date(log.time),
             COUNT(*) as num_rows,
             SUM( (status = '404 NOT FOUND')::int) as num_errors,
             AVG( (status = '404 NOT FOUND')::int) as error_ratio
      FROM log l
      WHERE status = '404 NOT FOUND'
      GROUP BY date(log.time)
     ) l
WHERE error_ratio > 0.01
ORDER BY error_ratio DESC;

请注意,这会将结果输出为0到1之间的比率,而不是百分比。我发现错误更容易使用。

答案 1 :(得分:0)

您正在执行聚合函数以及从表中选择所有内容。您必须拥有组中的聚合函数中不包含的所有字段。 要修复,请在GROUP BY子句中包含表日志中的所有字段 - 而不仅仅是log.time

答案 2 :(得分:0)

   count(log.status) * 100 / subquery1.error--- This doesn't sound meaningful to me

可以改写为:

SELECT date(log.time), count(log.status) * 100 / subquery1.error_instance AS error FROM
      (SELECT date(log.time), count(log.status) AS error_instance
         FROM log
            GROUP BY log.time
         ORDER BY error_instance desc) subquery1
      JOIN log
         ON date(log.time) = subquery1.date
    GROUP BY log.time
    HAVING status='404 NOT FOUND'
    ORDER BY error) ALIAS 

答案 3 :(得分:0)

使用

GROUP BY log.time,error
ORDER BY log.time,error

这可能会解决它。

答案 4 :(得分:0)

您在subquery1

之前缺少FROM子句
SELECT date(log.time), count(log.status) * 100 / subquery1.error_instance AS error FROM
  (SELECT date(log.time), count(log.status) AS error_instance
     FROM log
     WHERE status='404 NOT FOUND'
     GROUP BY log.time
     ORDER BY error_instance desc) subquery1
  JOIN log
     ON date(log.time) = subquery1.date
GROUP BY log.time
ORDER BY error

答案 5 :(得分:0)

问题似乎是在我的外部查询的select语句中尝试执行以下计算:

SELECT date(log.time), count(log.status) * 100 / subquery1.error_instance AS error

计算既是聚合(计数)又是从子查询引用的非聚合列。随后我在error中是否使用GROUP BY时收到错误。

相反,我从@Gordon Linoff那里获取灵感并创建了一个包含total_viewserror_instances列的表格,然后在我的外部查询中执行了更简单的计算:

SELECT date_column, sum(sq2.error_instance::FLOAT * 100 / sq2.total_views) AS error
FROM
    (SELECT date(log.time) AS date_column, count(log.status) AS total_views, sq1.error_instance
    FROM
        (SELECT date(log.time) AS date_column, count(log.status) AS error_instance
        FROM log
        WHERE status='404 NOT FOUND'
        GROUP BY date(log.time)
        ORDER BY error_instance desc) sq1
    JOIN log on date(log.time) = sq1.date_row
    GROUP BY date(log.time), sq1.error_instance
    ORDER BY total_views desc) sq2
GROUP BY date_column
ORDER BY error desc