修改SQL查询以包含其他参数

时间:2016-11-28 08:29:24

标签: sql parameters extraction

我想通过SQL查询提取数据,但是给定的代码没有给我一个报告,其中包括我想要的所有数据。

基本上,该报告结合了许多样本的数据(准确地说是95),然后给出了这些样本的序列。它还比较了这些序列,看它们是否弹出比1更多的样本。

我想将参数“v_family”和“j_gene”作为附加列包含在内,查询需要从其中一个样本中获取这些参数(与获取氨基酸序列(“amino_acid”)的方式类似)其中一个样本,出现此序列。)

如何将此两个参数添加到此报告中?

这是当前查询产生的6列(另见截图):

select 
    value, 
    rank, 
    count(*) over (partition by amino_acid) as contributors, 
    total, 
    amino_acid, 
    sample_name 
from ( select 
        value, 
        row_number() over (partition by sample_name order by rank desc) as rank, 
        sum(value) over (partition by amino_acid) as total, 
        amino_acid, 
        sample_name 
            from ( select 
                        sum(productive_frequency) as value, 
                        sum(productive_frequency) as rank, 
                        amino_acid, 
                        sample_name 
                    from sequences 
                    group by 
                        amino_acid, 
                        sample_name 
                    order by 
                        value desc 
            )inner_query  
    ) outer_inner  
order by 
    sample_name asc, 
    rank

提出了以下编辑,但没有得到我想要的数据(参见附件截图):

select value, rank, count(*) over (partition by amino_acid) as contributors, total, amino_acid, sample_name from ( select value, row_number() over (partition by sample_name order by rank desc) as rank, sum(value) over (partition by amino_acid) as total, amino_acid, sample_name from ( select sum(productive_frequency) as value, sum(productive_frequency) as rank, amino_acid, sample_name, v_family from sequences group by amino_acid, sample_name, v_family order by value desc ) inner_query  ) outer_inner  order by sample_name asc, rank

old query

new query

这是建议的,但没有改变结果:

select 
    value, 
    rank, 
    total, 
    amino_acid, 
    sample_name 
from ( select 
        value, 
        row_number() over (partition by sample_name order by rank desc) as rank, 
        sum(value) over (partition by amino_acid) as total, count(*) over (partition by amino_acid,v_family,j_gene) as contributors,
        amino_acid, 
       sample_name from ( SELECT sum(productive_frequency) AS value
    ,sum(productive_frequency) AS rank
    ,v_family
    ,j_gene
    ,amino_acid
    ,sample_name
FROM sequences
GROUP BY amino_acid
    ,sample_name
    ,v_family
    ,j_gene
ORDER BY value DESC ) inner_query ) outer_inner order by sample_name asc, rank

好的,它已经解决了!正确的代码如下,非常感谢所有帮助过的人!

SELECT value
    ,rank
    ,count(*) OVER (PARTITION BY amino_acid,v_family,j_gene) AS contributors
    ,total
    ,amino_acid
    ,sample_name
    ,v_family
    ,j_gene
FROM (
    SELECT value
        ,row_number() OVER (PARTITION BY sample_name ORDER BY rank DESC) AS rank
        ,sum(value) OVER (PARTITION BY amino_acid,v_family,j_gene) AS total
        ,amino_acid
        ,sample_name
        ,v_family
        ,j_gene
    FROM (
        SELECT sum(productive_frequency) AS value
            ,sum(productive_frequency) AS rank
            ,v_family
            ,j_gene
            ,amino_acid
            ,sample_name
        FROM sequences
        GROUP BY amino_acid
            ,sample_name
            ,v_family
            ,j_gene
        ORDER BY value DESC
        ) inner_query
    ) outer_inner
ORDER BY sample_name ASC
    ,rank

2 个答案:

答案 0 :(得分:0)

您的多级查询全部基于最内层查询,其中从表/视图"序列"中选择数据,因此如果需要另外2个参数,则必须仅在最内部查询中添加,最有可能是一个或多个附加表,它们将被连接到" sequence"表/图。

而不是当前的4列值,rank,amino_acids,sample_name比最内层查询中的6列(加上Gene,Family)。这些额外的2列必须包含在分组中,因此它们将出现在最顶层的查询中。

答案 1 :(得分:-1)

可能会在基本查询中添加2个新列。 "组" statement通过在两种情况下都不同的所选列的唯一组合来对数据进行分组。 比较这些查询结果: SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,v_family ,j_gene ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ,v_family ,j_gene ORDER BY value DESC

SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ORDER BY value DESC 如果行计数不同,那么您可以使用以下语句检查所提及列的唯一组合: select distinct v_family ,j_gene ,amino_acid ,sample_name FROM sequences

当inner_query结果发生变化时,更改窗口函数 sum(value) over (partition by amino_acid) as total, sum(value) over (partition by amino_acid,v_family,j_gene) as total, 对不起每次按下[Enter]

时添加的回复

新版本。 SELECT value ,rank ,count(*) OVER (PARTITION BY amino_acid,v_family,j_gene) AS contributors ,total ,amino_acid ,sample_name ,v_family ,j_gene FROM ( SELECT value ,row_number() OVER (PARTITION BY sample_name ORDER BY rank DESC) AS rank ,sum(value) OVER (PARTITION BY amino_acid,v_family,j_gene) AS total ,amino_acid ,sample_name ,v_family ,j_gene FROM ( SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,v_family ,j_gene ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ,v_family ,j_gene ORDER BY value DESC ) inner_query ) outer_inner ORDER BY sample_name ASC ,rank