PostgreSQL:从两个聚合表中总结信息

时间:2016-03-05 12:11:48

标签: postgresql join aggregate

我的方法或逻辑存在问题。

我试图将两个表中的所有数据相加。如果两者相符,则将它们相加,如果其中任何一个不对应,仍然显示单个查询总数,最后按顺序每年估算。 我尝试过LEFT JOINS,FULL JOINS,(UNIONS)。没有什么能够尽可能地总结并提供数据。 这里的关键点是pb和th_year信息是需要结果的年份。

错误必须在我的代码中明显。

单独的聚合查询会产生正确的结果。

它是我错误的两个查询的组合。

对此表示赞赏。

我认为这很简单。

我认为这可能很简单。只是愚蠢在我身边。

CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2
AS SELECT 

a.owner,
a.region,
a.district,
a.plantation,
b.th_year,
a.pb,
a.wc,

sum(a.tcf_calcarea + b.tth_calcarea) AS area,
sum(a.tcf_total + b.tth_total) AS total,
sum(a.tcf_ws + b.tth_ws) AS ws,
sum(a.tcf_util + b.tth_util) AS util,
sum(a.tcf_s + b.tth_s) AS s,
sum(a.tcf_a + b.tth_a) AS a,
sum(a.tcf_b + b.tth_b) AS b,
sum(a.tcf_c + b.tth_c) AS c,
sum(a.tcf_d + b.tth_d) AS d

 FROM
  (SELECT 
  cfdata.owner,
  cfdata.region,
  cfdata.district,
  cfdata.plantation,
  cfdata.pb,
  cfdata.wc,
  sum(cfdata.calcarea)AS tcf_calcarea,
  sum(cfdata._ba) AS tcf_ba,
  sum(cfdata._total) AS tcf_total,
  sum( cfdata._ws) AS tcf_ws,
  sum( cfdata._util) AS tcf_util,
  sum( cfdata._s) AS tcf_s,
  sum( cfdata._a) AS tcf_a,
  sum( cfdata._b) AS tcf_b,
  sum( cfdata._c) AS tcf_c,
  sum( cfdata._d) AS tcf_d

 FROM cfdata

 GROUP BY  cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc 
 ORDER BY  cfdata.owner, cfdata.region, cfdata.district, cfdata.plantation, cfdata.pb, cfdata.wc) a


JOIN

(SELECT 
  thdata.owner,
  thdata.region,
  thdata.district,
  thdata.plantation,
  thdata.th_year,
  thdata.wc,
  sum(thdata.calcarea)AS tth_calcarea,
  sum(thdata.th_ba) AS tth_ba,
  sum(thdata.th_total) AS tth_total,
  sum(thdata.th_ws) AS tth_ws,
  sum(thdata.th_util) AS tth_util,
  sum(thdata.th_s) AS tth_s,
  sum(thdata.th_a) AS tth_a,
  sum(thdata.th_b) AS tth_b,
  sum(thdata.th_c) AS tth_c,
  sum(thdata.th_d) AS tth_d

FROM thdata

GROUP BY  thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc 
ORDER BY  thdata.owner, thdata.region, thdata.district, thdata.plantation, thdata.th_year, thdata.wc) b

 ON  a.owner = b.owner AND a.region = b.region AND a.district = b.district and a.plantation = b.plantation AND a.pb = b.th_year AND a.wc = b.wc

GROUP BY  a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc 
ORDER BY  a.owner, a.region, a.district, a.plantation, a.pb, b.th_year, a.wc

thdata样本:

owner       region      district    plantation  compartment calcarea     wc  plantdate  th_year th_age   th_dbh  th_ht   th_vtree    th_sph  th_ba   th_total  th_ws     th_util     th_s    th_a    th_b    th_c    th_d    thdata_id
KeyProjects Northern    Marshlands  River Glen  A27         14.02       PFN 01/08/2009  2017    8        12.3    7.3     0.0289      179     28      70        14        56          42      14       0      0       0       1
KeyProjects Northern    Marshlands  River Glen  A28          2.1        ESN 01/12/2010  2012    2         4.5    4.2     0           479      2       0         0         0           0       0       0      0       0       2
KeyProjects Northern    Marshlands  River Glen  A28          2.1        ESN 01/12/2010  2014    4        10.2    9.6     0.0188      250      4      11         0         8           4       6       0      0       0       3
KeyProjects Northern    Marshlands  River Glen  A29         2.71        ESN 01/08/2009  2011    2         4.5    4.2     0           479      3       0         0         0           0       0       0      0       0       4
KeyProjects Northern    Marshlands  River Glen  A29         2.71        ESN 01/08/2009  2013    4        10.2    9.6     0.0188      250      5      14         0        11           5       8       0      0       0       5

thdata样本:

owner       region      district    plantation  compartment  wc     pb      calcarea     cfage   dbh    ht      vtree   sph     _ba   _total   _ws   _util   _s   _a     _b    _c   _d   tmai    umai    smai    cfdata_id
KeyProjects Northern    Marshlands  River Glen  A01          EF1    2021    5.27         10      14.5   20.4    0.1109  1004     90    585      21  564      84    401    79    0   0    11.1    10.7    1.5     1
KeyProjects Northern    Marshlands  River Glen  A02          EF1    2021    36.1         10      14.5   20.4    0.1109  1004    614   4007     144  3863    578   2744   542    0   0    11.1    10.7    1.5     2
KeyProjects Northern    Marshlands  River Glen  A03          EF1    2021    5.5          10      14.5   20.4    0.1109  1004     94    611      22  589      88    418    83    0   0    11.1    10.7    1.5     3
KeyProjects Northern    Marshlands  River Glen  A04          EF1    2021    11.91        10      14.5   20.4    0.1109  1004    202   1322      48  1274    191    905   179    0   0    11.1    10.7    1.5     4
KeyProjects Northern    Marshlands  River Glen  A05          EF1    2022    39.17        11      14.9   21.8    0.1286  1000    705   5053     157  4857    666   3486   744    0   0    11.7    11.3    1.7     5

预期结果:

owner       region      district    plantation  th_year pb      wc  area    total   ws      util    s       a       b       c   d
KeyProjects Northern    Marshlands  River Glen  2008    2008    EF1 620.49  44176   1788    42389   7562    31953   2852    0   0
KeyProjects Northern    Marshlands  River Glen  2009    2009    EF1 635.65  44319   1778    42476   7634    31993   2852    0   0
KeyProjects Northern    Marshlands  River Glen  2010    2010    EF1 1202.31 87980   3453    84487   14906   63883   5704    0   0
KeyProjects Northern    Marshlands  River Glen  2011    2011    EF1 1948.37 132378  5275    127104  22662   95895   8556    0   0
KeyProjects Northern    Marshlands  River Glen  2012    2012    EF1 1378.61 87928   3429    84477   14878   63922   5704    0   0

1 个答案:

答案 0 :(得分:0)

好的,您的查询存在一些问题:

  • 在主查询中,请勿使用sum(a.tcf_calcarea + b.tth_calcarea) AS area。您可以简单地添加,但是您应该确保先使用NULL替换任何0值:代替所有coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area来代替sum()。这也意味着您不再在此级别进行聚合,因此您应该删除最终的GROUP BY子句。
  • 现在在两个子查询之间创建FULL OUTER JOIN。这意味着您可以从连接的两个子查询中获取所有行,并且任何一方都不存在相应的行,列值为NULL
  • 在子查询中ORDER BY毫无意义,规划器将以最佳方式处理行集。您应该只在外层订购。
  • 根据定义(加入条件)b.th_year = a.pb,您可以删除两列中的一列。

一些语法指针:

  • 您的子查询只使用一个表,因此无需使用表别名,为您节省了大量的输入。
  • 更多节省:在GROUP BY子句中使用位置参数,这样您就可以编写GROUP BY 1, 2, 3, 4, 5, 6。与ORDER BY相同。
  • JOIN子句中,您可以撰写USING (owner, region, district, plantation, wc),然后添加WHERE a.pb = b.th_year。除此之外,对于任何USING列,您不再需要主查询中的子查询别名。但是,一个连接条件没有相应的列名这一事实确实会使事情稍微混乱;由你决定。

总而言之,这就是你得到的:

CREATE VIEW public.cf_th_data_totals_by_year_by_wc_2 AS
  SELECT owner, region, district, plantation, b.th_year, wc,
         coalesce(a.tcf_calcarea, 0) + coalesce(b.tth_calcarea, 0) AS area,
         ...
  FROM (
    SELECT owner, region, district, plantation, pb, wc,
           sum(calcarea) AS tcf_calcarea,
           ...
    FROM cfdata
    GROUP BY 1, 2, 3, 4, 5, 6) a
  FULL JOIN (
    SELECT owner, region, district, plantation, th_year, wc,
           sum(calcarea) AS tth_calcarea,
           ...
    FROM thdata
    GROUP BY 1, 2, 3, 4, 5, 6) b
  USING (owner, region, district, plantation, wc)
  WHERE a.pb = b.th_year
  ORDER BY 1, 2, 3, 4, 5, 6;