我试图从第一张表中获取一些信息,然后将其链接到我拥有的某些人口统计信息。
SELECT
colA,
colB,
DATE(serverTimeStamp) AS newDate,
eventType,
pgSource,
COUNT(*) FROM (
SELECT
*,
MAX(IF(LOWER(parameters.name)="pagesource", parameters.value, NULL)) WITHIN RECORD AS pgSource
FROM
TABLE_DATE_RANGE(mytableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) )
WHERE
LOWER(parameters.name)="Allison"
GROUP BY
parameters.name,
parameters.value,
newDate,
eventType,
pgSource
但是,添加新表会更改我的结果(计数应该相同)。这是第一张具有正确数据的表格结果。
SELECT
colA,
colB
DATE(serverTimeStamp) AS newDate,
eventType,
UD.gender,
UD.locationKey
pgSource,
COUNT(distinct instanceId) FROM (
SELECT
*,
MAX(IF(LOWER(parameters.name)="Allison", parameters.value, NULL)) WITHIN RECORD AS pgSource
FROM
TABLE_DATE_RANGE(myTableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) ) EV
join each
replicated.UserDimension AS UD
ON
UD.userId = EV.userId
WHERE
LOWER(parameters.name)="isfirstcontact"
GROUP EACH BY
colA,
colB,
newDate,
eventType,
pgSource,
UD.gender,
UD.locationKey
有关如何处理此问题的任何提示?
** 米哈伊尔善意地提醒我,在第一张桌子上有多个userIds会让我失去理智。我如何调整这个事实?
答案 0 :(得分:1)
任何提示?
只是为了给你一个想法 - 在下面运行并看到差异
SELECT
COUNT(*),
COUNT(1),
COUNT(instanceId),
COUNT(DISTINCT instanceId)
FROM
(SELECT NULL AS instanceId),
(SELECT 1 AS instanceId),
(SELECT 2 AS instanceId),
(SELECT 1 AS instanceId),
而且,我建议您检查COUNT([DISTINCT] ...)和EXACT_COUNT_DISTINCT()
之间的区别另一个要研究的方向 - 检查你是否有任何一方的重复userId(同一userId的多行) - 这也可能是计数不匹配的来源