Postgres:需要不同的记录数

时间:2018-06-01 19:56:06

标签: postgresql distinct-on

我有一个包含重复条目的表,目的是根据最新的时间戳获取不同的条目。

在我的情况下,'serial_no'会有重复的条目,但我会根据最新的时间戳选择唯一的条目。

以下查询给出了带有最新时间戳的独特结果。 但我担心的是我需要获得唯一条目的总数。

例如假设我的表总共有40个条目。通过以下查询,我可以根据序列号获得20个唯一行。 但'total'返回40而不是20。 对此有何帮助?

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info 
 ORDER BY 
   serial_no asc

enter image description here

product_info table intially has this data  

serial_no           name         timestamp                              

11212               pulp12      2018-06-01 20:00:01             
11213               mango       2018-06-01 17:00:01             
11214               grapes      2018-06-02 04:00:01             
11215               orange      2018-06-02 07:05:30             
11212               pulp12      2018-06-03 14:00:01             
11213               mango       2018-06-03 13:00:00             



After the distict query I got all unique results based on the latest 
timestamp:

serial_no       name        timestamp                   total

11212           pulp12     2018-06-03 14:00:01            6
11213           mango      2018-06-03 13:00:00            6
11214           grapes     2018-06-02 04:00:01            6
11215           orange     2018-06-02 07:05:30            6


But total is appearing as 6 . I wanted the total to be 4 since it has 
only 4 unique entries.

I am not sure how to modify my existing query to get this desired 
result.

3 个答案:

答案 0 :(得分:2)

您可以做的是将窗口函数移动到更高级别的select语句。这是因为在应用distinct on和limit子句之前评估窗口函数。此外,您不能在窗口函数中包含DISTINCT关键字 - 它尚未实现(截至Postgres 9.6)。

 SELECT 
  *,
  COUNT(*) OVER() as total -- here
 FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC
    LIMIT 
      10
  ) AS my_info

此外,不需要偏移,再多一次排序也是多余的。我删除了这些。

另一种方法是在select子句中包含一个计算列,但这不会像需要再扫描一次表那样快。这显然是假设您的总数与结果集严格相关,而不是存储在表格中的内容,但会过滤掉。

答案 1 :(得分:0)

select count(*), serial_no from product_info group by serial_no

将为您提供每个序列号重复的数量

合并该信息的最无意识的方式是加入子查询

  SELECT 
  * 
  FROM 
  (
    SELECT 
      DISTINCT ON (serial_no) id, 
      serial_no, 
      name, 
      timestamp,
      COUNT(*) OVER() as total 
    FROM 
      product_info 
      INNER JOIN my.account ON id = accountid 
    WHERE 
      lower(name) = 'hello' 
    ORDER BY 
      serial_no, 
      timestamp DESC OFFSET 0 
    LIMIT 
      10
  ) AS my_info
  join (select count(*) as counts, serial_no from product_info group by serial_no) as X
  on X.serial_no = my_info.serial_no
 ORDER BY 
   serial_no asc

答案 2 :(得分:0)

Postgres支持COUNT(DISTINCT column_name),因此,如果我理解了您的请求,那么使用COUNT(*)代替OVER即可,您可以放弃 options_init(){ if(current_user_can('manage_options') && isset($_POST['form_submitted'])): $hidden = esc_html($_POST['form_submitted']); if( $hidden == 'Y'){ $brand = $_POST['brand_color']; update_option('brand_color', $brand); } endif }