我的问题是,在大查询#标准SQL语句中添加了一些逻辑(safe_divide)之后,我开始接收重复数据。仅在我添加此行后才会出现此问题
SAFE_DIVIDE( u.weekly_capacity/25200, 1) AS TargetDailyHours
如果我不能解决这个问题,我可能只需要在Data Studio中编写所有逻辑,因为当前的工作流程是Harvest-> Stitch-> Bigquery-> data studio
在此查询中,我使用表time_entires
在MAX(updated_at)或最近时间条目上的左联接,到表users
的当前用户处于活动状态的完全联接。我希望实际操作数据,以便可以找到FTE的实际工作小时数/ weekly_capacity。但是,只要我编写逻辑或大型查询函数,结果都会重复?
SELECT DISTINCT outer_e.hours, outer_e.id, outer_e.updated_at,
outer_e.spent_date, outer_e.created_at,
outer_e.client_id, outer_e.user_id AS harvest_userid,
u.is_admin, u.first_name, u.is_active, u.id AS user_id,
u.weekly_capacity,
client.name as names,
--SAFE_DIVIDE( u.weekly_capacity /25200, 1) AS TargetDailyHours
FROM
(SELECT e.id, MAX(e.updated_at) AS updated_at FROM `harvest-experiment.harvest.time_entries` AS e
GROUP BY e.id LIMIT 1000
) AS inner_e
LEFT JOIN `harvest-experiment.harvest.time_entries` AS outer_e
ON inner_e.id = outer_e.id AND inner_e.updated_at = outer_e.updated_at
FULL JOIN ( SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin FROM `harvest-experiment.harvest.users`WHERE is_active = true
) AS u
ON outer_e.user_id = u.id
JOIN (SELECT DISTINCT id ,
name FROM `harvest-experiment.harvest.clients`) AS client
ON outer_e.client_id = client.id
结果中的“列”每周工作量将开始显示例如具有不同每周工作量数字的人
Row hours id updated_at spent_date created_at client_id harvest_userid is_admin first_name is_active user_id weekly_capacity TargetDailyHours
1
0.22
995005338
2019-05-07 15:14:13 UTC
2019-04-29 00:00:00 UTC
2019-04-29 15:30:40 UTC
6864491
2622223
false
Nolan
true
2622223
72000
2.857142857142857
2
0.22
995005338
2019-05-07 15:14:13 UTC
2019-04-29 00:00:00 UTC
2019-04-29 15:30:40 UTC
6864491
2622223
false
Nolan
true
2622223
129600
5.142857142857143
在此结果中,用户Nolan将显示两次条目,其序号为995005338,时长为0.22小时,而Weekly_capacity的数量将从ROW:2中的129600更改为ROW:1中的72000
答案 0 :(得分:0)
实际的问题出在u.weekly_capacity列上,对于同一用户,它具有两个或多个不同的值。 SAFE_DIVIDE操作仅反映此问题。
您可以将此重复值跟踪到“ u”子查询:
SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin
FROM `harvest-experiment.harvest.users`
WHERE is_active = true
用户表包含两行或多行具有相同ID的行,其中is_active=true
。这似乎与数据有关,因此为了避免重复的行,您必须确定要保留的值是哪一行。例如,如果您只想保留最大值,则可以使用GROUP BY:
SELECT id, first_name, MAX(weekly_capacity) as weekly_capacity, is_active, is_admin
FROM `harvest-experiment.harvest.users`
WHERE is_active = true
GROUP BY id, first_name, is_active, is_admin
另外,如果您的用户表具有足够的信息,则可以使用其他列来进一步缩小结果
例如:
...
LEFT JOIN `harvest-experiment.harvest.time_entries` AS outer_e
ON inner_e.id = outer_e.id AND inner_e.updated_at = outer_e.updated_at
FULL JOIN (
SELECT DISTINCT id, first_name, weekly_capacity, is_active, is_admin, last_updated
FROM `harvest-experiment.harvest.users` WHERE is_active = true
) AS u
ON outer_e.user_id = u.id AND outer_e.updated_at = u.last_updated
...