SQL-计算唯一值的首次出现

时间:2019-03-29 00:14:25

标签: sql postgresql

我有一个包含用户活动的日志表。 我正在尝试创建一个查询,以显示唯一用户条目和新用户条目。

样本数据:

| uid | act | tm                       |
| --- | --- | ------------------------ |
| 1   | l   | 2019-01-02T00:00:00.000Z |
| 1   | l   | 2019-01-05T00:00:00.000Z |
| 2   | l   | 2019-02-02T00:00:00.000Z |
| 1   | l   | 2019-02-03T00:00:00.000Z |
| 2   | l   | 2019-02-04T00:00:00.000Z |
| 3   | l   | 2019-02-05T00:00:00.000Z |
| 1   | l   | 2019-03-02T00:00:00.000Z |
| 2   | l   | 2019-03-02T00:00:00.000Z |
| 3   | l   | 2019-03-02T00:00:00.000Z |
| 4   | l   | 2019-03-02T00:00:00.000Z |

第一部分很简单:count(distinct(uid)) as tot_users

但是有没有办法做第二部分-计算在那个时期出现但没有出现过的用户...

这是到目前为止我得到的- https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/1


为将来参考,我更新了db-fiddle并提供了2个建议的解决方案。两者都能很好地工作:

https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/6

SELECT
        to_char( date_trunc('month', tm), 'YYYY-MM') as mnth,
        count(uid) as tot_entries, 
        COUNT(DISTINCT uid) as tot_users,

        COUNT(DISTINCT 
                CASE 
                    WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm) 
                    THEN uid 
                END) AS new_users

FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x 
GROUP BY mnth
ORDER BY mnth;


SELECT 
        to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
        count(l1.uid) tot_entries,
        count(DISTINCT l1.uid) tot_users,

        count(DISTINCT 
                CASE
                    WHEN NOT EXISTS (SELECT *
                                        FROM logs l2
                                        WHERE l2.uid = l1.uid
                                            AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM')) 
                    THEN
                          l1.uid
                END) new_users

FROM logs l1
GROUP BY mnth
ORDER BY mnth;

3 个答案:

答案 0 :(得分:1)

您可以使用条件聚合。在CASE表达式中,检查上个月同一用户的日志条目是否存在。除非找到这样的条目,否则返回用户的ID。使用该表达式作为count()的参数。

SELECT to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
       count(l1.uid) tot_entries,
       count(DISTINCT l1.uid) tot_users,
       count(DISTINCT CASE
                        WHEN NOT EXISTS (SELECT *
                                                FROM logs l2
                                                WHERE l2.uid = l1.uid
                                                      AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM')) THEN
                          l1.uid
                      END) new_users
       FROM logs l1
       GROUP BY mnth
       ORDER BY mnth;

答案 1 :(得分:1)

您可以在子查询中使用窗口函数来计算每个用户的第一个日志条目的时间戳,例如:

SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l

然后,您可以在外部查询中分析结果。当用户的第一个日志条目的日期属于分析间隔时,您可以将其计为新用户

假设参数:start_tm:end_tm代表分析周期的开始和结束,您将:

SELECT
    COUNT(DISTINCT uid) as tot_users,
    COUNT(DISTINCT CASE WHEN min_tm >= :start_tm AND min_tm < :end_tm THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x 
WHERE tm >= :start_tm AND tm < :end_tm

如果您需要按月汇总:

SELECT
    DATE_TRUNC('month', tm) AS my_month,
    COUNT(DISTINCT uid) as tot_users,
    COUNT(DISTINCT CASE WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm) THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x 
GROUP BY my_month
ORDER BY my_month

答案 2 :(得分:0)

您可以使用Have子句或自连接。您提到了一个句点,所以我不确定确切的过滤条件,但假设这是一种简单的情况,您可以执行以下操作

$('body').on('click', '#btn-add', function(e) {
    e.preventDefault();
    $('#modal').modal('show');
});