Question

如果我从所有现有ga_sessions_或firebase表中提取完整数据，则已处理字节数为4.5GB。

如果我将上一个查询保存到目标表中，然后从该表中提取完整数据，那么 Bytes Processed 为217GB。

两个表都具有相同的表大小。为什么会出现这种差异？

更新

我的standardSQL查询：

SELECT TABLE_SUFFIX AS Date, 
user_dim.app_info.app_instance_id, 
user_dim.app_info.app_version, 
user_dim.geo_info.city, 
user_properties.key, 
event.name 
FROM project.dataset.app_events_*, 
UNNEST(user_dim.user_properties) AS user_properties, 
UNNEST(event_dim) AS event

返回4.5GB。如果我保存此表（称为 historical_data ），我将撰写此查询：

SELECT *
FROM `project.dataset.historical_data`

然后它返回217GB。

Answer 1

我认为这是可能的，因为双交叉连接 - 对于每个交叉连接行，您现在有多余的下面字段集

TABLE_SUFFIX AS Date, 
user_dim.app_info.app_instance_id, 
user_dim.app_info.app_version, 
user_dim.geo_info.city

所以即使原始表的大小为4.5GB，结果也是217 GB

对我有意义 - 这是一件很开心的事[与BigData相关 - 如果不够小心，结果会爆炸到巨大的尺寸

并且，顺便说一句，检查原始表与输出表中的行数

从历史表和ga_sessions_历史表

1 个答案: