Google Big Query:使用自定义维度获取新的访问者数量

时间:2018-05-01 01:44:00

标签: google-bigquery

select   PARSE_DATE('%Y%m%d', t.date) as Date
        ,count(distinct(fullvisitorid)) as User
       ,SUM( totals.newVisits ) AS New_Visitors
       ,(if(customDimensions.index=1, customDimensions.value,null))  as Orig
FROM `table` as t
CROSS JOIN UNNEST(hits) AS hit
CROSS JOIN UNNEST(hit.customDimensions ) AS customDimensions  
group by Date, orig

有没有办法获得新的访客数量并同时使用customDimension?总和(total.newVisits)不起作用。

由于

2 个答案:

答案 0 :(得分:1)

以下是BigQuery Standard SQL

   
SELECT DATE 
  ,COUNT(DISTINCT(fullvisitorid)) AS User
  ,SUM( newVisits ) AS New_Visitors
  ,Orig
FROM (
  SELECT PARSE_DATE('%Y%m%d', t.date) AS DATE
    ,fullvisitorid
    ,totals.newVisits AS newVisits
    ,(IF(customDimensions.index=1, customDimensions.value,NULL))  AS Orig
  FROM `table` AS t
  CROSS JOIN UNNEST(hits) AS hit
  CROSS JOIN UNNEST(hit.customDimensions ) AS customDimensions  
  GROUP BY DATE, orig, fullvisitorid, newVisits
)
GROUP BY DATE, Orig

答案 1 :(得分:1)

在您的情况下,最好的方法是删除交叉连接并使用子选择:

SELECT
  PARSE_DATE('%Y%m%d', t.date) AS Date
  ,(SELECT value FROM UNNEST(customDimensions) WHERE index=1) Orig
  ,COUNT(DISTINCT(fullvisitorid)) AS User
  ,SUM( totals.newVisits ) AS New_Visitors
FROM
  `table` t
GROUP BY Orig, Date

如果您在命中范围上有维度并且确实需要展平表格,则需要构建一个可以计算不同的会话ID。这是因为您通过应用交叉连接重复命中范围内的所有会话范围字段:

SELECT
  PARSE_DATE('%Y%m%d', t.date) AS Date
  ,(SELECT value FROM h.customDimensions WHERE index=2) justAHitCd
  ,h.page.pagePathLevel1
  ,COUNT(DISTINCT(fullvisitorid)) AS User

  -- create session id and count distinct
  ,COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS STRING)) ) AS all_sessions

  -- only count distinct session id of sessions where totals.newVisits = 1
  ,COUNT(DISTINCT 
    IF(totals.newVisits=1, 
      CONCAT(fullvisitorid, CAST(visitstarttime AS STRING)), 
      NULL )
   ) AS New_Visitors

FROM
  -- flatten table to hit scope (comma means cross-join in stnd sql)
  `table` t, t.hits h
GROUP BY 1,2,3

因此,对于新访问者,我只提供会话ID,如果totals.newVisits = 1 - 否则if语句提供NULL这是不可数的。

如果您在产品范围内有类似内容,则需要创建一个考虑会话和点击的ID。 例如。计算productSku的页面:

SELECT
  PARSE_DATE('%Y%m%d', t.date) AS Date
  ,(SELECT value FROM h.customDimensions WHERE index=2) justAHitCd
  ,p.productSku
  ,COUNT(DISTINCT fullvisitorid) AS users
  ,COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS STRING))) AS sessions
  ,COUNT(DISTINCT 
    IF(h.type='PAGE',
      CONCAT(fullvisitorid, cast(visitstarttime AS STRING),CAST(hitNumber AS STRING)),
      NULL)  
  ) as pageviews
  ,COUNT(1) AS products
FROM
  `table` t, t.hits h LEFT JOIN h.product p
GROUP BY 1,2,3

请注意,我已加入产品阵列。由于它有时是空的,因此交叉连接会破坏所有命中信息:使用空表交叉连接会导致空表。

希望有所帮助!