根据当前行的条件选择上一行

时间:2014-02-28 16:25:16

标签: sql postgresql

数据集包含不同公司的每日(工作日)时间序列。还有一个指标变量(ind)取值为1或0.如果给定公司的ind为1,那么我想构建数据集的子样本,包括该指标事件之前某个时间范围内该公司的所有条目。

我们可以想到以下示例数据:

  day              company    ind          
  2012-01-11       A          0            
  2012-01-11       B          0            
  2012-01-11       C          0            
  2012-01-12       A          0            
  2012-01-12       B          0            
  2012-01-12       C          0            
  2012-01-13       A          0            
  2012-01-13       B          1            
  2012-01-13       C          0            
  2012-01-16       A          0            
  2012-01-16       B          0            
  2012-01-16       C          0            
  2012-01-17       A          1            
  2012-01-17       B          0            
  2012-01-17       C          0            
  2012-01-18       A          0            
  2012-01-18       B          1            
  2012-01-18       C          0 

我的目标是一个子样本,包括指标事件公司A和B在各自事件之前的时间范围(-2天到-1天)(确保在此时间范围内各自没有其他事件公司)。这将是我想要的结果:

  day              company    ind         
  2012-01-11       B          0            
  2012-01-12       B          0            
  2012-01-13       A          0            
  2012-01-13       B          0            
  2012-01-16       A          0            
  2012-01-16       B          0            
  2012-01-17       B          0 

如果数据集中只有一个公司只有一个指标事件,则以下代码有效:

    CREATE TABLE temp AS
    SELECT Row_Number() OVER (PARTITION BY company ORDER BY day) AS rowid, *
    FROM   mytable

    CREATE TABLE window AS SELECT * 
                        FROM temp t1
                        WHERE company IN (
                                        SELECT company
                                        FROM temp t2
                                        WHERE t2.ind = 1)
                        AND rowid BETWEEN((SELECT rowid FROM temp where ind = 1) - 2)  
                                  AND ((SELECT rowid FROM temp where ind = 1) -1)

但我真的很难将其扩展到多个事件公司的情况,并且每个公司可能会有多个事件,例如示例数据集。

你有什么想法可以解决这个问题吗?

1 个答案:

答案 0 :(得分:3)

由于您在尝试时按公司进行分区,我假设您不希望结果中出现以下行:

2012-01-13       B          0 

如果是这种情况,您可以使用LEAD()向前看1或2行,看看是否填充了ind标志:

WITH cte AS (SELECT * ,LEAD(ind) OVER(PARTITION BY company ORDER BY day) AS Lead1
                      ,LEAD(ind,2) OVER(PARTITION BY company ORDER BY day) AS Lead2
             FROM Table1)
SELECT Day,Company,Ind
FROM cte
WHERE Lead1 = 1 
  OR  Lead2 = 1
ORDER BY day,company

演示:SQL Fiddle

更新:考虑到更大的范围,这种方法更好,因为您可以指定要查看的前面行数(演示更新为包括两者):

WITH cte AS (SELECT *
                  , MAX(ind) OVER(PARTITION BY company ORDER BY day ROWS BETWEEN 1 following AND 2 following) Lead1
             FROM Table1)
SELECT Day,Company,Ind
FROM cte
WHERE Lead1 = 1 
ORDER BY day,company