Question

首先，抱歉问题标题。无论可能是什么，我都没有统计用语或这种连接难度。

我有一个查询*，基本上我生成了三件事...... random_sex，random_first和random_last。我正在尝试加入this method。

 random_sex |   random_first   |   random_last    
------------+------------------+------------------
 male       | 47.7101715711225 | 24.3833348881337
 male       | 72.8463141907472 | 28.3560050522089
 female     | 72.8617294209544 | 33.3203859277759
 male       | 39.3406164890062 | 26.3352867371729
 female     | 28.6855500966031 | 65.8870893270099
 female     | 35.5960198949557 | 83.1188118207422
 male       | 11.5711074977927 |  10.544433838184
 male       | 15.6900786811765 | 18.7324617852545
 male       | 24.9860797089245 | 8.98265511383023
 female     | 80.4563122882508 |  35.594445341751
(10 rows)

基本上人口普查数据就像这样......

    name    | freq  | cumfreq | rank | name_type 
------------+-------+---------+------+-----------
 SMITH      | 1.006 |   1.006 |    1 | LAST
 JOHNSON    |  0.81 |   1.816 |    2 | LAST
 WILLIAMS   | 0.699 |   2.515 |    3 | LAST
 JONES      | 0.621 |   3.136 |    4 | LAST
 BROWN      | 0.621 |   3.757 |    5 | LAST
 DAVIS      |  0.48 |   4.237 |    6 | LAST
 MILLER     | 0.424 |    4.66 |    7 | LAST
 WILSON     | 0.339 |       5 |    8 | LAST
 MOORE      | 0.312 |   5.312 |    9 | LAST
 TAYLOR     | 0.311 |   5.623 |   10 | LAST
 ANDERSON   | 0.311 |   5.934 |   11 | LAST
 THOMAS     | 0.311 |   6.245 |   12 | LAST
 JACKSON    |  0.31 |   6.554 |   13 | LAST
 WHITE      | 0.279 |   6.834 |   14 | LAST
 HARRIS     | 0.275 |   7.109 |   15 | LAST
 MARTIN     | 0.273 |   7.382 |   16 | LAST
 THOMPSON   | 0.269 |   7.651 |   17 | LAST
 GARCIA     | 0.254 |   7.905 |   18 | LAST
 MARTINEZ   | 0.234 |    8.14 |   19 | LAST

而且，在这种情况下..

 random_sex |   random_first   |    random_last    
 male       | 47.7101715711225 | 24.3833348881337

我希望它像这样（程序性地）加入：

=# select * from census.names where cumfreq > 47.7101715711225 AND name_type = 'MALE_FIRST' order by cumfreq asc limit 1;
  name  | freq  | cumfreq | rank | name_type  
--------+-------+---------+------+------------
 SILVER | 0.009 |  47.717 | 1424 | MALE_FIRST

=# select * from census.names where cumfreq > 24.3833348881337 AND name_type = 'LAST' order by cumfreq asc limit 1;
  name  | freq  | cumfreq | rank | name_type 
--------+-------+---------+------+-----------
 HARPER | 0.054 |  24.408 |  185 | LAST

所以这个男士的名字就是Silver Harper。我一生中从未见过一个，但是they do exist.

我想在上面的查询中返回“Silver”“Harper”而不是随机数。我怎样才能让它像这样工作？

FOOTNOTE

*：只是为了保持简单：

SELECT
   CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
   , RANDOM() * 90.020 AS random_first -- dataset is 90% of most popular
   , RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1);

Answer 1

我实际上也不了解统计数据。但我认为这就是你想要的

让我们为返回随机列Randoms

的表命名

WITH RANDOMS AS
(
   SELECT
   CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
   , RANDOM() * 90.020 AS random_first 
   , RANDOM() * 90.483 AS random_last
   FROM generate_series(1,10,1)
)
SELECT (
        SELECT A.NAME 
        FROM census.names A
        WHERE A.cumfreq > R.random_first
        AND A.name_type = 'MALE_FIRST'
        order by A.cumfreq asc limit 1
       ), 
       (
        SELECT A.NAME 
        FROM census.names A
        WHERE A.cumfreq > R.random_last
        AND A.name_type = 'LAST'
        order by A.cumfreq asc limit 1
       ) AS NAME
FROM RANDOMS R ;

Answer 2

相关的子查询？

SELECT
  *
FROM
  yourRandomTable
INNER JOIN
  census.names         AS first_name
    ON  first_name.cumfreq = (SELECT MIN(cumfreq)
                              FROM   census.names
                              WHERE  cumfreq > yourRandomTable.random_first
                                AND  type    = yourRandomTable.random_sex + '_FIRST')
    AND first_name.type    = yourRandomTable.random_sex + '_FIRST'
INNER JOIN
  census.names         AS last_name
    ON  last_name.cumfreq  = (SELECT MIN(cumfreq)
                              FROM   census.names
                              WHERE  cumfreq > yourRandomTable.random_last
                                AND  type    = 'LAST')
    AND last_name.type     = 'LAST'

你可以改变这种模式。具体如何选择这取决于您如何设置索引。

Answer 3

EXPLAIN ANALYZE SELECT
  r.sex
  , r.detail
  , COALESCE(
    (SELECT name FROM census.names AS mf WHERE r.sex = 'male' AND mf.name_type = 'MALE_FIRST' AND mf.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
    , (SELECT name FROM census.names AS ff WHERE r.sex = 'female' AND ff.name_type = 'FEMALE_FIRST' AND ff.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
  ) AS first
  , (SELECT name FROM census.names AS l WHERE l.name_type = 'LAST' AND l.cumfreq > r.last ORDER BY cumfreq LIMIT 1) AS last
FROM (
  SELECT
    RANDOM() * 90.020 AS first
    , RANDOM() * 90.483 AS last
    , CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS sex
  FROM generate_series(1,10,1)
) AS r;

这实际上就是我最终的目标。

Answer 4

作弊，笛卡尔产品

Select q1.Name as Forename, q2.Name as Surname
From 
(select Name from census.names where cumfreq > 47.7101715711225 
 AND name_type = 'MALE_FIRST' order by cumfreq asc limit 1) q1, 
(select Name from census.names where cumfreq > 24.3833348881337 
 AND name_type = 'LAST' order by cumfreq asc limit 1) q2

我将如何加入此统计数据？

4 个答案: