PostgreSQL,NOT IN子句

时间:2018-05-20 11:48:46

标签: sql postgresql analytics

我想计算DAU并排除我们不认为“真实”的用户(员工,beta测试人员等)。

之前我在查询中编写过滤时效果很好:

SELECT 
    count(distinct user_id) AS daily, 
    e.event_timestamp::DATE AS date
FROM 
    "public"."events" AS e
WHERE
   user_id IN (SELECT  
           distinct id
        from
            "user"."user"
        WHERE 
            username IS NOT NULL AND position IS NOT NULL )
GROUP BY date

当我尝试将其更改为下面时,应该给出或多或少相同的计数(基本上不是定义4000“真实用户”,而是定义我要排除的1000个“非用户”)。然而,这给了我更多的计数。这就像明确的声明不起作用。

我在子查询中添加了NOT NULL,但没有更改结果。是否有一些NOT IN +子查询以与IN子句不同的方式工作?

SELECT 
    count(distinct e.user_id) AS daily, 
    e.event_timestamp::DATE AS date
FROM 
    "public"."events" AS e
WHERE
   e.user_id NOT IN (SELECT distinct id FROM "public"."non_users" WHERE id IS NOT NULL)
GROUP BY 
    date
ORDER BY
    date

1 个答案:

答案 0 :(得分:1)

是。如果子查询中的任何值为NULL,则NOT IN不返回任何行为此,我强烈建议您始终使用NOT EXISTS - 它的行为符合预期。

您似乎知道这一点,因为您在NULL中使用WHERE比较。所以,差异可能是由于其他条件造成的。所以,也包括它:

SELECT count(distinct e.user_id) AS daily, 
       e.event_timestamp::DATE AS date
FROM  "public"."events" e
WHERE NOT EXISTS (SELECT 1
                  FROM "public"."non_users" nu
                  WHERE e.user_id = nu.id AND
                        nu.position IS NOT NULL
                 )
GROUP BY date
ORDER BY date;