无法正确加入bigquery

时间:2016-04-21 01:06:48

标签: sql google-bigquery

我试图从第一张表中获取一些信息,然后将其链接到我拥有的某些人口统计信息。

SELECT
  colA,
  colB,
  DATE(serverTimeStamp) AS newDate,
  eventType,
  pgSource,
  COUNT(*) FROM (
  SELECT
    *,
    MAX(IF(LOWER(parameters.name)="pagesource", parameters.value, NULL)) WITHIN RECORD AS pgSource
  FROM
    TABLE_DATE_RANGE(mytableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) )
WHERE
  LOWER(parameters.name)="Allison"
GROUP BY
  parameters.name,
  parameters.value,
  newDate,
  eventType,
  pgSource

但是,添加新表会更改我的结果(计数应该相同)。这是第一张具有正确数据的表格结果。

SELECT

  colA,
  colB
  DATE(serverTimeStamp) AS newDate,
  eventType,
  UD.gender,
  UD.locationKey
  pgSource,

  COUNT(distinct instanceId) FROM (

  SELECT
    *,
    MAX(IF(LOWER(parameters.name)="Allison", parameters.value, NULL)) WITHIN RECORD AS pgSource
  FROM
    TABLE_DATE_RANGE(myTableA, TIMESTAMP('2016-02-02 00:00:00'), TIMESTAMP('2016-02-02 23:59:59')) ) EV

join each
  replicated.UserDimension AS UD
ON
  UD.userId = EV.userId


WHERE
  LOWER(parameters.name)="isfirstcontact"

GROUP EACH BY
  colA,
 colB,
  newDate,
  eventType,
  pgSource,
  UD.gender,
  UD.locationKey

有关如何处理此问题的任何提示?

** 米哈伊尔善意地提醒我,在第一张桌子上有多个userIds会让我失去理智。我如何调整这个事实?

1 个答案:

答案 0 :(得分:1)

  

任何提示?

只是为了给你一个想法 - 在下面运行并看到差异

SELECT 
  COUNT(*), 
  COUNT(1), 
  COUNT(instanceId), 
  COUNT(DISTINCT instanceId) 
FROM
  (SELECT NULL AS instanceId),
  (SELECT 1 AS instanceId),
  (SELECT 2 AS instanceId),
  (SELECT 1 AS instanceId),

而且,我建议您检查COUNT([DISTINCT] ...)EXACT_COUNT_DISTINCT()

之间的区别

另一个要研究的方向 - 检查你是否有任何一方的重复userId(同一userId的多行) - 这也可能是计数不匹配的来源