我是大问题的新手,需要创建烫发。日志文件和引用表中的表。我是新手,并做错了什么,但无法弄清楚原因。请帮忙。
日志文件(示例)
-Event_Time,USER_ID,类型
-1 / 1 / 2017,123_abc,一个
-1 / 2 / 2017,123_bcd,B
-1 / 2 / 2017,123_abc,C
参考表(示例)
-Type Partner
-a 1
-b 2
-c 3
-d 3
create table workarea.SummaryTable AS (
User_ID string,
TotalCount integer,
imps_time timestamp,
Partner integer)
insert into workarea.SummaryTable
select distinct User_ID,
COUNT(*) as TotalCount,
MIN(TIME) as imps_time,
SUM(case when Partner = '1' then 1 else 0 end) as 1,
SUM(case when Partner = '2' then 1 else 0 end) as 2,
SUM(case when Partner = '3' then 1 else 0 end) as 3
from workarea.logfile i
join workarea.referencetable r on i.Type=r.Type
where CID=10848805
group by USER_ID
答案 0 :(得分:1)
..并做错了什么但无法弄清楚为什么
到目前为止我已经确定的失败点
CREATE TABLE
语句在BigQuery中不可用
Data Manipulation Language
仅允许您INSERT
,DELETE
和UPDATE
您需要拥有表pre-created
才能使用/插入数据
所以下面的片段不正确
SUM(CASE WHEN Partner = '1' THEN 1 ELSE 0 END) AS 1,
SUM(CASE WHEN Partner = '2' THEN 1 ELSE 0 END) AS 2,
SUM(CASE WHEN Partner = '3' THEN 1 ELSE 0 END) AS 3
你应该使用像
这样的东西SUM(CASE WHEN Partner = '1' THEN 1 ELSE 0 END) AS Partner_1,
SUM(CASE WHEN Partner = '2' THEN 1 ELSE 0 END) AS Partner_2,
SUM(CASE WHEN Partner = '3' THEN 1 ELSE 0 END) AS Partner_3
某些字段在引用表中看起来不存在,但您在最终查询中使用它们:time
中的MIN(time) as imps_time
和CID
中的WHERE CID=10848805
目标表的模式有4个字段 - 而select语句的模式有6个字段。你应该清楚这一点!!他们必须匹配!
可能的“解决方案”(BigQuery Standard SQL)
我假设(仅为了在这里取得一些进展)目的地表的架构在现实中如下所示
User_ID STRING,
TotalCount INT64,
imps_time TIMESTAMP,
Partner_1 INT64,
Partner_2 INT64,
Partner_3 INT64
在这种情况下 - 下面的查询应该产生正确的插入结果
#standardSQL
SELECT
User_ID,
COUNT(*) AS TotalCount,
MIN(Event_Time) AS imps_time,
SUM(CASE WHEN Partner = '1' THEN 1 ELSE 0 END) AS Partner_1,
SUM(CASE WHEN Partner = '2' THEN 1 ELSE 0 END) AS Partner_2,
SUM(CASE WHEN Partner = '3' THEN 1 ELSE 0 END) AS Partner_3
FROM `workarea.logfile` i
JOIN `workarea.referencetable` r ON i.Type=r.Type
-- WHERE CID=10848805
GROUP BY USER_ID