我有一个名为:result的表 我正在使用BigQuery从GA中选择数据
SELECT
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
FROM
`atomic-life-148403.126959513.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
ORDER BY
date DESC
有一些记录重复。如何从表中删除重复的记录?
答案 0 :(得分:1)
您可以使用ROW_NUMBER
WITH CTE AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY transactionid ORDER BY
transactionid) ROW FROM [YourTable])
DELETE [YourTable]
FROM [YourTable]
JOIN CTE ON [YourTable].transactionid ON CTE.transactionid
WHERE CTE.ROW > 1
答案 1 :(得分:1)
下面是BigQuery Standard SQL
#standardSQL
SELECT DISTINCT
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
FROM
`atomic-life-148403.126959513.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1 YEAR) AS STRING), '-','')
AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
ORDER BY
date DESC
正如您所看到的 - 我刚刚将DISTINCT添加到您的SELECT中 - 请参阅有关BigQuery Standard SQL的SELECT and its modifiers
的更多信息
答案 2 :(得分:0)
SELECT DISTINCT *
FROM [YourTable]
答案 3 :(得分:0)
您可以使用ROW_NUMBER()
分析函数,例如
select * from (
select *,
ROW_NUMBER() OVER(PARTITION BY transactionid ORDER BY transactionid) rownum
from result ) xxx
where rownum = 1;
答案 4 :(得分:0)
您可以选择唯一行并删除其他行:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT DISTINCT * FROM MyTable
) as UniqueRows ON
MyTable.KeyField= UniqueRows.KeyField
WHERE
UniqueRows.KeyField IS NULL;
答案 5 :(得分:0)
对所有选定列使用GROUP BY
时,应删除结果中任何真正重复的行:
SELECT
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
FROM
`atomic-life-148403.126959513.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
_TABLE_SUFFIX BETWEEN REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL -1
YEAR) AS STRING), '-','')
AND CONCAT('intraday_', REPLACE(CAST(DATE_ADD(CURRENT_DATE(), INTERVAL 0 DAY) AS STRING), '-',''))
GROUP BY
Date,
totals.pageviews,
h.transaction.transactionId,
h.item.itemQuantity,
h.transaction.transactionRevenue,
totals.bounces,
fullvisitorid,
totals.timeOnSite,
device.browser,
device.deviceCategory,
trafficSource.source,
channelGrouping,
h.page.pagePath,
h.eventInfo.eventCategory,
device.operatingSystem
ORDER BY
date DESC;