在嵌套查询中使用样本关键字

时间:2018-06-22 09:58:41

标签: sql teradata

我有一个查询,格式如下:

with t1 as (
  select id, col2
  from atable) 
select
  distinct id
from t1
sample 100
inner join t1 as t2 on t1.id = t2.id

返回错误3706,"expected something between an integer and the inner keyword"

当我注释掉行样本100时,查询运行正常。

我的最终目标是从t1获得样本。但是,由于ID可以在t1中出现多次,所以我不必使用样本将其分解。因此,我希望避免使用采样数据集,因为我使用sample关键字,每个id的事件历史记录被拆分或缺少条目。换句话说,我想提取一个ID样本,然后用它来过滤我的表t1。

这样,将完成t1中每个ID的事件历史记录。

我该怎么做?

1 个答案:

答案 0 :(得分:2)

在DISTINCT运算和ORDER BY之前的GROUP BY / HAVING / QUALIFY之后执行SAMPLE。您需要将示例移至CTE:

with t1 as (
  select id, col2
  from atable
  sample 100
) 
select
  distinct id
from t1
inner join t1 as t2 on t1.id = t2.id

根据您的评论,您想将Sample应用于不同的值:

with t1 as (
  select id
  from atable
  group by id -- Distinct is calculated after Sample
  sample 100
) 
select t.*
from atable as t
join t1 
  on t1.id = t2.id

如果atable大,则不同的操作可能会占用大量资源(在Sample之前先被假脱机),并且嵌套的Sample应该会提高性能:

with t1 as (
  select id
  from 
   ( select id 
     from atable
                  -- reduce the number of rows for the following Group By
     sample 10000 -- sample must be large enough to have 100 distinct IDs
   ) as t
  group by id -- Distinct is calculated after Sample
  sample 100
) 
select t.*
from atable as t
join t1 
  on t1.id = t2.id