对postgreSQL

时间:2016-04-19 15:48:54

标签: postgresql sampling

我有一组带有一组值的表,表示例是

ID  |  Customer_name  | workorder
1   |    abc          | dispatch
2   |    xyz          | not_dispatch
3   |    jdk          | dispatch     

并且这总共持续了1M行..现在我想将这个数据集采样到5000行,我希望3400个工作器为“not_dispatch”,1600个样本中带有“dispatch”。 如何在PostgreSQL中完成。

1 个答案:

答案 0 :(得分:1)

远非有效但有效:

SELECT *
FROM (
  SELECT * FROM my_table
  WHERE workorder = 'dispatch' -- other filters
  ORDER BY random() LIMIT 1600) sub1
UNION
SELECT *
FROM (
  SELECT * FROM my_table
  WHERE workorder = 'not_dispatch' -- other filters
  ORDER BY random() LIMIT 3400) sub2;