T-SQL:计算第一次成功之前的失败次数

时间:2017-08-11 15:39:17

标签: sql sql-server tsql analytics vertica

我有一个由带时间戳的事件组成的数据库:

row eventName taskName timestamp userName
1   fail      ABC      10.5      John
2   fail      ABC      18.0      John
3   fail      ABC      19.0      Mike
4   fail      XYZ      21.0      John
5   fail      XYZ      23.0      Mike
6   success   ABC      25.0      John
7   fail      ABC      26.0      John
8   success   ABC      28.0      John

我想计算每个用户第一次成功之前的失败次数(和平均值,但这超出了这个问题)。

在上面的例子中,John尝试了ABC任务2次(第1行和第2行),直到成功(第6行)。随后的失败和成功可以忽略不计。

我想我可以通过计算“ABC”和“失败”的行数来实现这一点,其时间戳早于具有“ABC”和“成功”的所有行中的最早时间戳,按userName分组。我如何在T-SQL中表达这一点?具体来说,Vertica。

这似乎与此处的案例非常相似: sql count/sum the number of calls until a specific date in another column

但是当我尝试调整https://stackoverflow.com/a/39594686/4354459中的代码时,我认为我出错了,因为我的计数比预期的要多。

WITH
Successes
AS
(
    SELECT
        events.userName
        ,events.taskName
        ,MIN(events.timestamp) AS FirstSuccessTime
    FROM events
    WHERE events.eventName = 'success'
    GROUP BY events.userName, events.taskName
)
SELECT
    events.userName
    ,events.taskName
    ,COUNT(events.eventName) AS FailuresUntilFirstSuccess
FROM
    Successes
    LEFT JOIN events
        ON  events.taskName = Successes.taskName
        AND events.timestamp < Successes.FirstSuccessTime
        AND events.eventName = 'fail'
GROUP BY events.userName, events.taskName
;

2 个答案:

答案 0 :(得分:1)

解决方案

根据架构,此查询将为您提供所需内容:

with Failures as
(
    select * from Event where event_name = 'fail'
),

Q as
(
    select * from Event E
        outer apply
        (
            select count(*) cnt from Failures F
                where F.task_name = E.task_name and F.username = E.username and F.ts < E.ts
        ) F

    where E.event_name = 'success'
)

select * from
(
    select Q.*, 
    row_number() over (partition by event_name, task_name, username order by ts) o from Q
) K where K.o = 1

使用您的数据进行测试得出:

id event_name task_name  timestamp  username  cnt
-- ---------- ---------- ---------- --------- ---
6  success    ABC        25         John      2

但是,我走得更远,为迈克添加了另一个“成功”行

insert Event select 'success', 'XYZ', 29.0, 'Mike':

并获取

 id event_name task_name  timestamp  username  cnt
 -- ---------- ---------- ---------- --------- ---
 6  success    ABC        25         John      2
 9  success    XYZ        29         Mike      1

正如所料。

解释

第一个CTE会产生一系列故障。第二个CTE是递归的,其中基本情况是成功集合,递归情况是在给定成功之前(相对于用户和任务名称)的失败集合的计数(基数)。

最后,我们对row_numberevent_nametask_name上的分区使用username,以便将给定分区的第一次成功标记为'1 ”。然后我们只过滤掉row_number不等于'1'的所有行。

答案 1 :(得分:0)

可能有一种更简单的方式来到这里,但我会试着看一下。

测试数据设置

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
        <html>
        <head>
        </head>
        <body>
        <select class="form-control" id="verglobalFIlterDropdwn">
         <option value="Open">Open </option>
         <option value="Verified">Verified </option>
          <option value="Rejected">Rejected </option>
         </select>
        <table class="table">
        <tbody>
    <tr>
    <td>
    <select class="form-control" id="billStat1">
      <option value="Open" selected>Open</option>
     <option value="Verified">Verified</option>
      <option value="Rejected">Rejected</option>
      </select>
    </td>
    </tr>
    <tr>
    <td>
    <select class="form-control" id="billStat2">
      <option value="Open" >Open</option>
     <option value="Verified" selected>Verified</option>
      <option value="Rejected">Rejected</option>
      </select>
    </td>
    </tr>
    <tr>
    <td>
    <select class="form-control" id="billStat3">
      <option value="Open" selected>Open</option>
     <option value="Verified">Verified</option>
      <option value="Rejected">Rejected</option>
      </select>
    </td>
    </tr>
    <tr>
    <td>
    <select class="form-control" id="billStat4">
      <option value="Open" selected>Open</option>
     <option value="Verified">Verified</option>
      <option value="Rejected">Rejected</option>
      </select>
    </td>
    </tr>
    </tbody>
        </table>
        </body>
    <script type="text/javascript">
    $('#verglobalFIlterDropdwn').on('change', function() {
    console.log(this.value);
    /*logic to get row of table containing selected values same as values from id='verglobalFIlterDropdwn' selectbox.
    case1: If selected values from id='verglobalFIlterDropdwn' selectbox equals already selected values of select in table row, then show it otherwise ,hide other row not containing values of select same as id='verglobalFIlterDropdwn' selectbox
    
    */
    }
    </script>
    </html>

查询时间

IF OBJECT_ID(N'tempdb..#taskevents', N'U') IS NOT NULL   
DROP TABLE #taskevents;  
GO  

CREATE TABLE #taskevents ( 
      eventName varchar(10)
    , taskName varchar(10)
    , ts decimal(3,1)
    , userName varchar(10)
) ;

INSERT INTO #taskevents ( eventName, taskName, ts, userName )
VALUES 
      ('fail','ABC','10.5','John')
    , ('fail','ABC','10.6','John')
    , ('fail','ABC','18.0','John')
    , ('fail','ABC','22.0','John')
    , ('fail','ABC','22.5','John')
    , ('success','ABC','25.0','John')

    , ('fail','ABC','26.0','John')
    , ('success','ABC','28.0','John')

    , ('fail','XYZ','10.7','John')
    , ('fail','XYZ','21.0','John')

    , ('fail','ABC','19.0','Mike')

    , ('fail','XYZ','23.0','Mike')
    , ('success','XYZ','28.5','Mike')

    , ('success','QVC','42.0','Mike')
;

这可以为您提供每位用户的平均失败率。