Postgres,Rails和选择不在group子句中的列

时间:2019-03-15 07:11:13

标签: ruby-on-rails postgresql

我有以下查询,其中我想按treatment_selections.treatment_id分组并选择要调用的treatments.name列:

@search = Trial.joins(:quality_datum, treatment_selections: :treatment)
.select('DISTINCT ON (treatment_selections.treatment_id) treatment_selections.treatment_id, treatments.name, AVG(quality_data.yield) as yield')
.where("EXTRACT(year from season_year) BETWEEN #{params[:start_year]} AND #{params[:end_year]}")

我得到了可怕的错误:

PG::GroupingError: ERROR:  column "treatment_selections.treatment_id" must appear in the GROUP BY clause or be used in an aggregate function

所以我切换到以下查询:

@search = Trial.joins(:quality_datum, treatment_selections: :treatment)
.select('treatments.name, treatment_selections.treatment_id, treatments.name, AVG(quality_data.yield) as yield')
.where("EXTRACT(year from season_year) BETWEEN #{params[:start_year]} AND #{params[:end_year]}")  
.group('treatment_selections.treatment_id')

我知道这是行不通的,因为在group子句中未引用treatments.name。但是我认为最好的方法应该起作用,因为我没有按任何分组。我知道在group子句中不需要引用使用AVG和SUM之类的方法,但是不引用任何聚合函数的列又如何呢?

我已经看到嵌套查询是完成我要执行的操作的一种可能方式,但是我不确定如何最好地使用上述查询来实现此目的。希望有人可以帮助我。

登录

SELECT treatment_selections.treatment_id, treatment.name, AVG(quality_data.yield) as yield FROM "trials" INNER JOIN "treatment_selections" ON "treatment_selections"."trial_id" = "trials"."id" INNER JOIN "quality_data" ON "quality_data"."treatment_selection_id" = "treatment_selections"."id" INNER JOIN "treatment_selections" "treatment_selections_trials" ON "treatment_selections_trials"."trial_id" = "trials"."id" INNER JOIN "treatments" ON "treatments"."id" = "treatment_selections_trials"."treatment_id" WHERE (EXTRACT(year from season_year) BETWEEN 2018 AND 2018) GROUP BY treatment_selections.treatment_id)

2 个答案:

答案 0 :(得分:2)

将无法选择多个列(不进行聚合)并不能一起使用聚合函数,除非您按选定的列进行分组-否则无法确定应如何计算平均值(整个数据集与按某种方式分组) )。您可以这样做-

@search = Trial.joins(:quality_datum, treatment_selections: :treatment)
.select('treatment_selections.treatment_id, treatments.name, AVG(quality_data.yield) as yield')
.where("EXTRACT(year from season_year) BETWEEN ? AND ?", params[:start_year], params[:end_year])  
.group('treatment_selections.treatment_id, treatments.name')

尽管如果一个treatments.id可以与多个treatment.name关联,这可能不适用于您的用例

答案 1 :(得分:0)

我不是Rails专家,但让我们分析记录的查询:

  

以yield的形式选择treatment_selections.treatment_id,treatment.name,AVG(quality_data.yield)
  来自“试验”
  INNER JOIN“ treatment_selections” ON“ treatment_selections”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“ quality_data”开启“ quality_data”。“ treatment_selection_id” =“ treatment_selections”。“ id”
  INNER JOIN“ treatment_selections”“ treatment_selections_trials” ON“ treatment_selections_trials”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“治疗” ON“治疗”。“ id” =“治疗选择试验”。“治疗ID”
  位置(摘录(2018年至2018年之间的年份)。
  GROUP BY treatment_selections.treatment_id

也许您要依靠DISTINCT ON子句来完成这项工作,而无需同时指定两列。但是,正如您在日志中看到的那样,这并没有转换为SQL。

  

SELECT [缺少DISTINCT ON(treatment_selections.treatment_id)] treatment_selections.treatment_id,treatment.name,AVG(quality_data.yield)作为产量
  来自“试验”
  INNER JOIN“ treatment_selections” ON“ treatment_selections”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“ quality_data”开启“ quality_data”。“ treatment_selection_id” =“ treatment_selections”。“ id”
  INNER JOIN“ treatment_selections”“ treatment_selections_trials” ON“ treatment_selections_trials”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“治疗” ON“治疗”。“ id” =“治疗选择试验”。“治疗ID”
  位置(摘录(2018年至2018年之间的年份)。
  GROUP BY treatment_selections.treatment_id

但是,即使您设法强迫Rails实施 DISTINCT ON ,您也可能无法获得预期的结果,因为 DISTINCT ON 每个 treatment_id应该只返回一行

SQL的标准方法是将两个列都指定为聚合中的分组:

如果 treatment_id treatment_name 的关系为1:1,则如果您在没有 AVG函数的情况下运行查询 (并且不启用DISTINCT ON),数据将类似于:

|   treatment_id    |       name          |  yield    |  
------------------------------------------------------
|        1          |   treatment 1       |    0.50   |
|        1          |   treatment 1       |    0.45   |
|        2          |   treatment 2       |    0.65   |
|        2          |   treatment 2       |    0.66   |
|        3          |   treatment 3       |    0.85   |

现在要使用您必须通过(同时) treatment_id treatment_name 汇总的平均函数。

您必须同时指定两者的原因是因为数据库管理器假定结果数据集中的所有列都不相互关联。因此,按两列进行汇总

  

选择treatment_selections.treatment_id,治疗 s 。名称,AVG(quality_data.yield)作为产量
  来自“试验”
  INNER JOIN“ treatment_selections” ON“ treatment_selections”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“ quality_data”开启“ quality_data”。“ treatment_selection_id” =“ treatment_selections”。“ id”
  INNER JOIN“ treatment_selections”“ treatment_selections_trials” ON“ treatment_selections_trials”。“ trial_id” =“ trials”。“ id”
  INNER JOIN“治疗” ON“治疗”。“ id” =“治疗选择试验”。“治疗ID”
  位置(摘录(2018年至2018年之间的年份)。
  GROUP BY treatment_selections.treatment_id, treatments.name

将为您提供以下结果:

|   treatment_id    |       name          |   AVG(yield)   |  
------------------------------------------------------------
|        1          |   treatment 1       |      0.475     |
|        2          |   treatment 2       |      0.655     |
|        3          |   treatment 3       |      0.85      |

要更好地理解这一点,如果前两列中的结果数据不相关;例如:

|   year    |       name          |   yield   |  
-----------------------------------------------
|    2000   |   treatment 1       |    0.1    |
|    2000   |   treatment 1       |    0.2    |
|    2000   |   treatment 2       |    0.3    |
|    2000   |   treatment 3       |    0.4    |
|    2001   |   treatment 2       |    0.5    |
|    2001   |   treatment 3       |    0.6    |
|    2002   |   treatment 3       |    0.7    |

您仍必须按年份名称进行分组,在这种情况下,仅当年份和名称相同时才使用平均功能(请注意,否则无法完成)

|   year    |       name          |   AVG(yield)   |  
---------------------------------------------------
|    2000   |   treatment 1       |     0.15       |
|    2000   |   treatment 2       |     0.3        |
|    2000   |   treatment 3       |     0.4        |
|    2001   |   treatment 2       |     0.5        |
|    2001   |   treatment 3       |     0.6        |
|    2002   |   treatment 3       |     0.7        |