我有一个查询,格式如下:
with t1 as (
select id, col2
from atable)
select
distinct id
from t1
sample 100
inner join t1 as t2 on t1.id = t2.id
返回错误3706,"expected something between an integer and the inner keyword"
当我注释掉行样本100时,查询运行正常。
我的最终目标是从t1获得样本。但是,由于ID可以在t1中出现多次,所以我不必使用样本将其分解。因此,我希望避免使用采样数据集,因为我使用sample
关键字,每个id的事件历史记录被拆分或缺少条目。换句话说,我想提取一个ID样本,然后用它来过滤我的表t1。
这样,将完成t1中每个ID的事件历史记录。
我该怎么做?
答案 0 :(得分:2)
在DISTINCT运算和ORDER BY之前的GROUP BY / HAVING / QUALIFY之后执行SAMPLE。您需要将示例移至CTE:
with t1 as (
select id, col2
from atable
sample 100
)
select
distinct id
from t1
inner join t1 as t2 on t1.id = t2.id
根据您的评论,您想将Sample应用于不同的值:
with t1 as (
select id
from atable
group by id -- Distinct is calculated after Sample
sample 100
)
select t.*
from atable as t
join t1
on t1.id = t2.id
如果atable
大,则不同的操作可能会占用大量资源(在Sample之前先被假脱机),并且嵌套的Sample应该会提高性能:
with t1 as (
select id
from
( select id
from atable
-- reduce the number of rows for the following Group By
sample 10000 -- sample must be large enough to have 100 distinct IDs
) as t
group by id -- Distinct is calculated after Sample
sample 100
)
select t.*
from atable as t
join t1
on t1.id = t2.id