我想计算DAU并排除我们不认为“真实”的用户(员工,beta测试人员等)。
之前我在查询中编写过滤时效果很好:
SELECT
count(distinct user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
user_id IN (SELECT
distinct id
from
"user"."user"
WHERE
username IS NOT NULL AND position IS NOT NULL )
GROUP BY date
当我尝试将其更改为下面时,应该给出或多或少相同的计数(基本上不是定义4000“真实用户”,而是定义我要排除的1000个“非用户”)。然而,这给了我更多的计数。这就像明确的声明不起作用。
我在子查询中添加了NOT NULL,但没有更改结果。是否有一些NOT IN +子查询以与IN子句不同的方式工作?
SELECT
count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM
"public"."events" AS e
WHERE
e.user_id NOT IN (SELECT distinct id FROM "public"."non_users" WHERE id IS NOT NULL)
GROUP BY
date
ORDER BY
date
答案 0 :(得分:1)
是。如果子查询中的任何值为NULL
,则NOT IN
不返回任何行为此,我强烈建议您始终使用NOT EXISTS
- 它的行为符合预期。
您似乎知道这一点,因为您在NULL
中使用WHERE
比较。所以,差异可能是由于其他条件造成的。所以,也包括它:
SELECT count(distinct e.user_id) AS daily,
e.event_timestamp::DATE AS date
FROM "public"."events" e
WHERE NOT EXISTS (SELECT 1
FROM "public"."non_users" nu
WHERE e.user_id = nu.id AND
nu.position IS NOT NULL
)
GROUP BY date
ORDER BY date;