我在导出的GA数据上使用BigQuery(请参阅架构here)
查看文档,我看到当我选择记录中的字段时,它会自动展平该记录并复制周围的列。
所以我尝试创建一个非规范化的表,我可以在更像SQL的思维模式中查询
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
我将此表的结果写入另一个表 denorm (展平,大数据集)。
当我使用
子句查询 denorm 时,我得到了不同的结果WHERE session_id_STRING = "100001897901013346771447300813"
与包含上述查询(产生所需结果)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
我确定这是设计上的,但如果有人能解释这两种方法之间的差异会非常有用吗?
答案 0 :(得分:0)
我相信你说你确实选中了“#Flat; Flatten Results&#34;什么时候创建输出表?我从你的问题中假设session_id_STRING是一个重复的字段?
如果这些是正确的假设,那么您所看到的正是您从上述文档中引用的行为。你让BigQuery要求&#34;压扁结果&#34;所以它将你重复的字段变成了一个不重复的字段,并复制了它周围的所有字段,这样你就有了一个平面(即没有重复的数据)表。
如果您在查询子查询时看到所需的行为,则应在创建表时取消选中该框。
答案 1 :(得分:0)
查看文档,我看到当我选择一个字段时 在记录中它会自动压平该记录 复制周围的列。
这不正确。顺便说一句,请你指出文档 - 它需要改进。
选择字段不会使该记录变平。所以,如果你有一个表T,只有一个记录{a = 1,b =(2,2,3)},那么
SELECT * FROM T WHERE b = 2
您仍然可以获得单个记录{a = 1,b =(2,2)}。此子查询中的SELECT COUNT(a)将返回1.
但是一旦用flatten = on编写此查询的结果,就会得到两条记录:{a = 1,b = 2},{a = 1,b = 2}。来自展平表的SELECT COUNT(a)将返回2.