根据google BigQuery SQL中的属性删除重复的行

时间:2017-05-09 09:04:20

标签: sql google-bigquery

我有一个名为:result的表 我正在使用BigQuery从GA中选择数据

SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
  ORDER BY
  date  DESC

enter image description here

有一些记录重复。如何从表中删除重复的记录?

我想得到以下结果。 enter image description here

6 个答案:

答案 0 :(得分:1)

您可以使用ROW_NUMBER

WITH CTE AS 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY 
transactionid) ROW FROM [YourTable]) 

DELETE [YourTable] 
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
                              WHERE CTE.ROW > 1

答案 1 :(得分:1)

下面是BigQuery Standard SQL

#standardSQL
SELECT DISTINCT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
  ORDER BY
  date  DESC  

正如您所看到的 - 我刚刚将DISTINCT添加到您的SELECT中 - 请参阅有关BigQuery Standard SQL的SELECT and its modifiers的更多信息

答案 2 :(得分:0)

SELECT DISTINCT *
FROM [YourTable]

答案 3 :(得分:0)

您可以使用ROW_NUMBER()分析函数,例如

select * from (
select *,
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum
from result ) xxx
where rownum = 1;

答案 4 :(得分:0)

您可以选择唯一行并删除其他行:

DELETE FROM MyTable
LEFT OUTER JOIN (
   SELECT DISTINCT * FROM MyTable
) as UniqueRows ON
   MyTable.KeyField= UniqueRows.KeyField
WHERE
   UniqueRows.KeyField IS NULL;

答案 5 :(得分:0)

对所有选定列使用GROUP BY时,应删除结果中任何真正重复的行:

SELECT
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
FROM
  `atomic-life-148403.126959513.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  _TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 
YEAR) AS STRING), '-','')
  AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
GROUP BY
  Date,
  totals.pageviews,
  h.transaction.transactionId,
  h.item.itemQuantity,
  h.transaction.transactionRevenue,
  totals.bounces,
  fullvisitorid,
  totals.timeOnSite,
  device.browser,
  device.deviceCategory,
  trafficSource.source,
  channelGrouping,
  h.page.pagePath,
  h.eventInfo.eventCategory,
  device.operatingSystem
ORDER BY
  date  DESC;