选择分组数据集的随机值

时间:2019-10-14 11:30:18

标签: sql postgresql select group-by

我对SQL并不陌生。但是我正在使用以下查询:

select count(*) as countis, avclassfamily
from malwarehashesandstrings
where behaviouralbinary IS true and
       avclassfamily != 'SINGLETON'
group by avclassfamily
ORDER BY countis desc
LIMIT 50;

我想从由avclassfamily列分组的malwarehashsha256列中选择3个随机散列。

以下查询有效,问题结束:

select count(*) as countis,avclassfamily from malwarehashesandstrings where behaviouralbinary IS true and avclassfamily != 'SINGLETON' group by avclassfamily ORDER BY countis desc LIMIT 50;

virustotal=# select m.avclassfamily, m.cnt,
        array_agg(malwarehashsha256)
 from (select malwarehashesandstrings.*,
              count(*) over (partition by avclassfamily) as cnt,
              row_number() over (partition by avclassfamily order by random()) as seqnum
       from malwarehashesandstrings
       where behaviouralbinary and
             avclassfamily <> 'SINGLETON'
      ) as m
 where seqnum <= 3
 group by m.avclassfamily, m.cnt ORDER BY m.cnt DESC LIMIT 50;

1 个答案:

答案 0 :(得分:1)

如果我理解正确,则可以使用row_number()

select m.*
from (select m.*,
             row_number() over (partition by m.avclassfamily order by random()) as seqnum
      from malwarehashesandstrings m
      where m.behaviouralbinary and
            m.avclassfamily <> 'SINGLETON'
     ) m
where seqnum <= 3;

如果要在现有查询的列中使用此方法,则一种方法是:

select m.avgclassfamily, m.cnt,
       array_agg(m.malwarehashsha256)
from (select m.*,
             count(*) over (partition by m.avgclassfamily) as cnt,
             row_number() over (partition by m.avclassfamily order by random()) as seqnum
      from malwarehashesandstrings m
      where m.behaviouralbinary and
            m.avclassfamily <> 'SINGLETON'
     ) m
where seqnum <= 3
group by m.avgclassfamily, m.cnt;