将长行转换为宽行,填充所有单元格

时间:2014-05-08 04:17:16

标签: sql postgresql postgresql-9.1 crosstab generate-series

我有关于商家的长格式数据,每次移动到不同位置都有一行,键入业务ID - 任何一个商业机构都可以有多个移动事件。

我希望重塑为宽格式,这通常是tablefunc模块的跨标签区域。

+-------------+-----------+---------+---------+
| business_id | year_move |  long   |   lat   |
+-------------+-----------+---------+---------+
|   001013580 |      1991 | 71.0557 | 42.3588 |
|   001015924 |      1993 | 71.0728 | 42.3504 |
|   001015924 |      1996 | -122.28 | 37.654  |
|   001020684 |      1992 | 84.3381 | 33.5775 |
+-------------+-----------+---------+---------+

然后我这样改造:

SELECT longbyyear.*
FROM crosstab($$
    SELECT 
    business_id, 
    year_move, 
    max(longitude::float)
    from business_moves
    where year_move::int between 1991 and 2010 
    group by business_id, year_move
    order by business_id, year_move;
    $$
) 
AS longbyyear(biz_id character varying, "long91" float,"long92" float,"long93" float,"long94" float,"long95" float,"long96" float,"long97" float, "long98" float, "long99" float,"long00" float,"long01" float,
"long02" float,"long03" float,"long04" float,"long05" float, 
"long06" float, "long07" float, "long08" float, "long09" float, "long10" float);

它 - 通常 - 让我达到理想的输出。

+---------+----------+----------+----------+--------+---+--------+--------+--------+
| biz_id  |  long91  |  long92  |  long93  | long94 | … | long08 | long09 | long10 |
+---------+----------+----------+----------+--------+---+--------+--------+--------+
| 1000223 | 121.3784 | 121.3063 | 121.3549 | 82.821 | … |        |        |        |
| 1000678 | 118.224  |          |          |        | … |        |        |        |
| 1002158 | 121.98   |          |          |        | … |        |        |        |
| 1004092 | 71.2384  |          |          |        | … |        |        |        |
| 1007801 | 118.0312 |          |          |        | … |        |        |        |
| 1007855 | 71.1769  |          |          |        | … |        |        |        |
| 1008697 | 71.0394  | 71.0358  |          |        | … |        |        |        |
| 1008986 | 71.1013  |          |          |        | … |        |        |        |
| 1009617 | 119.9965 |          |          |        | … |        |        |        |
+---------+----------+----------+----------+--------+---+--------+--------+--------+

唯一的障碍是,我理想情况下每年会有人口值,而不仅仅是移动年份的值。因此,所有字段都将填充,每年都有一个值,最近的地址将延续到下一年。我可以通过手动更新来解决这个问题,如果每个都是空白的,请使用前一列,我只是想知道是否有一种聪明的方法可以使用crosstab()函数或其他方式,可能与自定义函数结合使用

2 个答案:

答案 0 :(得分:2)

为了获得任何给定年份的每个business_id的当前位置,您需要两件事:

  1. 用于选择年份的参数化查询,实现为SQL语言函数。
  2. 在年份聚合的脏技巧,由business_id分组,并保持坐标不变。这是通过CTE中的子查询完成的。
  3. 该功能如下所示:

    CREATE FUNCTION business_location_in_year_x (int) RETURNS SETOF business_moves AS $$
      WITH last_move AS (
        SELECT business_id, MAX(year_move) AS yr
        FROM business_moves
        WHERE year_move <= $1
        GROUP BY business_id)
      SELECT lm.business_id, $1::int AS yr, longitude, latitude
      FROM business_moves bm, last_move lm
      WHERE bm.business_id = lm.business_id
      AND bm.year_move = lm.yr;
    $$ LANGUAGE sql;
    

    子查询仅为每个营业地点选择最近的移动。然后,主查询将添加经度和纬度列,并将请求的年份放入返回的表中,而不是最近一次移动发生的年份。需要注意的一点是:您需要在此表中创建一个记录,该记录提供每个business_id的建立和初始位置,或者在它移动到其他位置之后才会显示。

    使用通常的SELECT * FROM business_location_in_year_x(1997)调用此函数。另请参阅SQL fiddle

    如果 真的 需要交叉表,那么您可以调整此代码,为您提供一系列年份的营业地点,然后将其输入{{1功能。

答案 1 :(得分:2)

我假设您有每个业务搬迁的实际日期,因此我们可以每年制作有意义的选择

CREATE TEMP TABLE business_moves (
  business_id int,  -- why would you use inefficient varchar here?
  move_date date,
  longitude float,
  latitude float);

在此基础上,一个更有意义的测试用例:

INSERT INTO business_moves VALUES 
  (001013580, '1991-1-1', 71.0557, 42.3588),
  (001015924, '1993-1-1', 71.0728, 42.3504),
  (001015924, '1993-3-3', 73.0728, 43.3504),  -- 2nd move this year
  (001015924, '1996-1-1', -122.28, 37.654),
  (001020684, '1992-1-1', 84.3381, 33.5775);

完整,快速的解决方案

SELECT *
FROM crosstab($$
   SELECT business_id, year
        , first_value(x) OVER (PARTITION BY business_id, grp ORDER BY year) AS x
   FROM  (
      SELECT *
           , count(x) OVER (PARTITION BY business_id ORDER BY year) AS grp
      FROM  (SELECT DISTINCT business_id FROM business_moves) b
      CROSS  JOIN generate_series(1991, 2010) year
      LEFT   JOIN (
         SELECT DISTINCT ON (1,2)
                business_id
              , EXTRACT('year' FROM move_date)::int AS year
              , point(longitude, latitude) AS x
         FROM   business_moves
         WHERE  move_date >= '1991-1-1'
         AND    move_date <  '2011-1-1'
         ORDER  BY 1,2, move_date DESC
         ) bm USING (business_id, year)
      ) sub
   $$
   ,'VALUES
    (1991),(1992),(1993),(1994),(1995),(1996),(1997),(1998),(1999),(2000)
   ,(2001),(2002),(2003),(2004),(2005),(2006),(2007),(2008),(2009),(2010)'
    ) AS t(biz_id int
         , x91 point, x92 point, x93 point, x94 point, x95 point
         , x96 point, x97 point, x98 point, x99 point, x00 point
         , x01 point, x02 point, x03 point, x04 point, x05 point
         , x06 point, x07 point, x08 point, x09 point, x10 point);

结果:

 biz_id  |        x91        |        x92        |        x93        |        x94        |        x95        |        x96        |        x97        ...
---------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------
 1013580 | (71.0557,42.3588) | (71.0557,42.3588) | (71.0557,42.3588) | (71.0557,42.3588) | (71.0557,42.3588) | (71.0557,42.3588) | (71.0557,42.3588) ...
 1015924 |                   |                   | (73.0728,43.3504) | (73.0728,43.3504) | (73.0728,43.3504) | (-122.28,37.654)  | (-122.28,37.654)  ...
 1020684 |                   | (84.3381,33.5775) | (84.3381,33.5775) | (84.3381,33.5775) | (84.3381,33.5775) | (84.3381,33.5775) | (84.3381,33.5775) ...

步骤一步

第1步

修复你拥有的东西:

SELECT *
FROM crosstab($$
   SELECT DISTINCT ON (1,2)
          business_id
        , EXTRACT('year' FROM move_date) AS year
        , point(longitude, latitude) AS long_lat
   FROM   business_moves
   WHERE  move_date >= '1991-1-1'
   AND    move_date <  '2011-1-1'
   ORDER  BY 1,2, move_date DESC
   $$
   ,'VALUES
    (1991),(1992),(1993),(1994),(1995),(1996),(1997),(1998),(1999),(2000)
   ,(2001),(2002),(2003),(2004),(2005),(2006),(2007),(2008),(2009),(2010)'
   ) AS t(biz_id int
        , x91 point, x92 point, x93 point, x94 point, x95 point
        , x96 point, x97 point, x98 point, x99 point, x00 point
        , x01 point, x02 point, x03 point, x04 point, x05 point
        , x06 point, x07 point, x08 point, x09 point, x10 point);
  • 你想要lat&amp;让它变得有意义,所以从两者组成point。或者,您可以只连接text表示。

  • 您可能需要更多数据。使用DISTINCT ON代替max()来获取每年的最新(完整)行。详情:
    Select first row in each GROUP BY group?

  • 只要整个网格可能缺少值,必须使用带有两个参数的crosstab()变体。详细解释如下:
    PostgreSQL Crosstab Query

  • 将该功能改编为使用move_date date代替year_move

第2步

解决您的要求:

  

理想情况下,我每年都会填充值

使用CROSS JOIN个企业和年份构建完整的值网格(每个业务和每年一个单元格):

SELECT *
FROM  (SELECT DISTINCT business_id FROM business_moves) b
CROSS  JOIN generate_series(1991, 2010) year
LEFT   JOIN (
   SELECT DISTINCT ON (1,2)
          business_id
        , EXTRACT('year' FROM move_date)::int AS year
        , point(longitude, latitude) AS x
   FROM   business_moves
   WHERE  move_date >= '1991-1-1'
   AND    move_date <  '2011-1-1'
   ORDER  BY 1,2, move_date DESC
   ) bm USING (business_id, year)
  • 这些年份来自generate_series()来电。

  • 来自单独SELECT的独特商家。您可能有一个企业表,您可以使用(而且更便宜)?这也可以解释从未搬过的企业。

  • LEFT JOIN每年实际业务变动以达到完整的价值网格

第3步

填写默认值:

  

最近的地址延续到明年。

SELECT business_id, year
     , COALESCE(first_value(x) OVER (PARTITION BY business_id, grp ORDER BY year)
               ,'(0,0)') AS x
FROM  (
   SELECT *, count(x) OVER (PARTITION BY business_id ORDER BY year) AS grp
   FROM  (SELECT DISTINCT business_id FROM business_moves) b
   CROSS  JOIN generate_series(1991, 2010) year
   LEFT   JOIN (
      SELECT DISTINCT ON (1,2)
             business_id
           , EXTRACT('year' FROM move_date)::int AS year
           , point(longitude, latitude) AS x
      FROM   business_moves
      WHERE  move_date >= '1991-1-1'
      AND    move_date <  '2011-1-1'
      ORDER  BY 1,2, move_date DESC
      ) bm USING (business_id, year)
   ) sub;
  • 在基于步骤2的查询构建的子查询sub中,共享相同位置的单元格的表单组(grp)。

    为此目的,使用众所周知的聚合函数count()作为窗口聚合函数。 NULL值不计算,因此值随每次实际移动而增加,从而形成共享相同位置的单元格组。

  • 在外部查询中,使用窗口函数first_value()为同一组中的每一行选择每个组的第一个值。瞧。

  • 最重要的是,可选地(!)将其包裹在COALESCE中,以使用(0,0)填充未知位置(尚未移动)的剩余单元格。如果您这样做,则没有剩余的NULL值,您可以使用更简单的crosstab()形式。这是一个品味问题。

带有基本查询的

SQL Fiddle。 SQL Fiddle当前未安装crosstab()

第4步

在更新后的crosstab()来电中使用步骤3中的查询 总而言之,这应该是快速。索引可能会有所帮助。