计算不同记录之间的时间差

时间:2017-02-22 20:13:32

标签: postgresql amazon-web-services amazon-redshift

我有一个如下所示的数据集:

enter image description here

对于每个OwnerID,我想计算当前记录的列creationtime与下一条记录(对于同一ownerID)的差异,表格中一列新TimeDiff。我相信这里需要自我加入,但我不确定如何使用自联接来计算当前记录和下一条记录之间的差异。

执行此操作时,任何ownerID的最后一条记录的默认值均为' NA'因为它不会成为下一条记录(同一ownerID)来计算差异。

这是我用来获取此数据集的查询:

    SELECT DISTINCT ga.ownerid,
         mr.name,
         SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
         EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
         EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
         EXTRACT(DAY FROM ga.creationtime) AS DAY,
         EXTRACT(DOW FROM ga.creationtime) AS DOW,
         ga.creationtime,
         a.encodedid,
         a.name
  FROM flx2.groupactivities ga
    JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
    JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
    JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
    JOIN flx2.memberroles mr ON mr.id = mhr.roleid
  WHERE ga.activitytype = 'assign'
  AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
  AND   a.artifacttypeid = 54
  AND   a.encodedid IS NOT NULL
  ORDER BY ga.ownerid,
           ga.creationtime,
           a.encodedid

我使用Amazon Redshift来获取此数据。

任何帮助都将不胜感激。

TIA!

更新

我使用了@systemjack建议的方法。以下是我得到的结果:

enter image description here

我们在这里可以清楚地注意到encodedid列正在重复assignmentIDMAT.PRB.410,如上图中突出显示的那样),这不应该是案件。在上面提到的查询中,如果没有LEAD函数,则不会发生这种情况。这是我正在使用的更新查询(只有一个额外的LEAD函数):

SELECT DISTINCT ga.ownerid,
       mr.name,
       SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
       EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
       EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
       EXTRACT(DAY FROM ga.creationtime) AS DAY,
       EXTRACT(DOW FROM ga.creationtime) AS DOW,
       ga.creationtime,
       LEAD(ga.creationtime,1) OVER (PARTITION BY ga.ownerid ORDER BY ga.creationtime) AS nexttime,
       a.encodedid,
       a.name
FROM flx2.groupactivities ga
  JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
  JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
  JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
  JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND   a.artifacttypeid = 54
AND   a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
         ga.creationtime,
         a.encodedid LIMIT 1000

nexttime列中的值似乎也被抬高了。它似乎在ocassion的creationtime列中取下一个值。例如:在第二条记录中,nexttime列的值应该是2013-09-18 06:14:59而不是2014-01-18 12:16:49

为什么我们获得的记录超出预期?我该如何解决这些问题?

2 个答案:

答案 0 :(得分:2)

更新:这看起来更好吗?

with dataset as (
    SELECT DISTINCT ga.ownerid,
        mr.name,
        SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
        EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
        EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
        EXTRACT(DAY FROM ga.creationtime) AS DAY,
        EXTRACT(DOW FROM ga.creationtime) AS DOW,
        ga.creationtime,
        a.encodedid,
        a.name
    FROM flx2.groupactivities ga
    JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
    JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
    JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
    JOIN flx2.memberroles mr ON mr.id = mhr.roleid
    WHERE ga.activitytype = 'assign'
        AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
        AND   a.artifacttypeid = 54
        AND   a.encodedid IS NOT NULL
)
select d.*,
    LEAD(creationtime,1) OVER (PARTITION BY ownerid ORDER BY creationtime) AS nexttime
from dataset d
ORDER BY ownerid, creationtime, encodedid, nextime
LIMIT 1000

这样的事情(未经测试的代码)可能会起作用。想法是使用LEAD window function为每个所有者获取以下记录的creationtime,如果它是最后一条记录则为空,然后使用常规{ {3}}获得你想要的单位。外部查询中的DATEDIFF语句处理最后一个记录边缘情况,您可以调整它以获得您想要的结果。

select ownerid, creationtime,
    case when nextime is not null
        then datediff('second', creationtime, nextime)
        else datediff('second', creationtime, sysdate)
        end as timediff
from (
    select distinct ownerid, creationtime,
        lead(creationtime,1) over (partition by ownerid order by creationtime) as nexttime
    from yourdata
)

答案 1 :(得分:1)

我个人认为没有声明(纯SQL)方法来实现这一点。抱歉。你不能在集合中的特定记录中引用值(即使它是下一个还是上一个),这本质上也是如此。

所以我可以在这里看到三种方式:

1)对SQL使用过程扩展(MySQL也有)。

2)获取整套并在外部处理,在"客户端" (到RDBMS)方。

3)将timediff列添加到表+ AFTER INSERT / UPDATE触发器中,您将计算该差异并附加记录。