Postgres需要按唯一性获取行数

时间:2019-03-20 18:16:54

标签: sql postgresql

我有一个简单的表,其中包含经度,纬度和时间。基本上,我希望查询结果能给我这样的信息:

lat,long,hourwindow,count

我似乎不知道该怎么做。我已经尝试了很多东西,我无法保持直率。不幸的是,这是到目前为止我得到的:

WITH all_lat_long_by_time AS (
    SELECT
      trunc(cast(lat AS NUMERIC), 4) AS lat,
      trunc(cast(long AS NUMERIC), 4) AS long,
      date_trunc('hour', time :: TIMESTAMP WITHOUT TIME ZONE) AS hourWindow

    FROM my_table
),
    unique_lat_long_by_time AS (
      SELECT DISTINCT * FROM all_lat_long_by_time
  ),
  all_with_counts AS (
   -- what do I do here?
  )
SELECT * FROM all_with_counts;

2 个答案:

答案 0 :(得分:1)

我认为这是非常基本的聚合查询:

SELECT date_trunc('hour', time :: TIMESTAMP WITHOUT TIME ZONE) AS hourWindow
       trunc(cast(lat AS NUMERIC), 4) AS lat,
       trunc(cast(long AS NUMERIC), 4) AS long,
       COUNT(*)
FROM my_table
GROUP BY hourWindow, trunc(cast(lat AS NUMERIC), 4), trunc(cast(long AS NUMERIC), 4)
ORDER BY hourWindow

答案 1 :(得分:0)

如果“按唯一性计算的行数”是要每小时(截断数字后)每小时计算 distinct 个坐标,则count(DISTINCT (lat,long))会执行以下操作:

SELECT date_trunc('hour', time::timestamp) AS hour_window
     , count(DISTINCT (trunc( lat::numeric, 4)
                     , trunc(long::numeric, 4))) AS count_distinct_coordinates
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

手册here中的详细信息。
(lat,long)是ROW值,是ROW(lat,long)的缩写。更多here

但是count(DISTINCT ...)通常比较慢,对于您的情况,子查询应该更快:

SELECT hour_window, count(*) AS count_distinct_coordinates
FROM  (
   SELECT date_trunc('hour', time::timestamp) AS hour_window
        , trunc( lat::numeric, 4) AS lat
        , trunc(long::numeric, 4) AS long
   FROM   tbl
   GROUP  BY 1, 2, 3
   ) sub
GROUP  BY 1
ORDER  BY 1;

或者:

SELECT hour_window, count(*) AS count_distinct_coordinates
FROM  (
   SELECT DISTINCT
          date_trunc('hour', time::timestamp) AS hour_window
        , trunc( lat::numeric, 4) AS lat
        , trunc(long::numeric, 4) AS long
   FROM   tbl
   ) sub
GROUP  BY 1
ORDER  BY 1;

子查询折叠重复项后,外部SELECT可以使用普通count(*)