Question

这是我的表：

/user/hive/warehouse

我需要计算具体的频率，我给你举个例子：

每一行都是患者，所以＆＃34; chr1 123 AA＆＃34;有5名患者和＃34; chr1 123 AC＆＃34;有3个。

我想知道A的频率

计算结果是：

     chr  pos refalt
     ---------------
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AC
     chr1 123 AC
     chr1 123 AC
     chr2 456 TC
     chr3 789 GC

对于C：

13(A)
/16   , Because There are 13 people in "Chr1 123" who has A and in total they're 16 5XA (ref) 5XA(alt) + 3XA (ref) 3XC(alt)

如何在SQL中实现这一点太复杂了？

3(C)/16 , Because only 3 people has C是Refalt列，因此我需要拆分每个值才能获得ref和alt。

我知道有点复杂，只要问我更多细节。

Answer 1

对于任何想要了解的人（特别是生物学家）如何实现这一目标：

select substring(refalt from 1 for 1),  
           count( substring(refalt from 1 for 1) )::numeric / 
           (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele1
    from ft_variants
    where pos_chr like 'chr1 12783'
    group by refalt

union

select substring(refalt from 2 for 1),  
       count( substring(refalt from 2 for 1) )::numeric / 
       (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele2
from ft_variants
where pos_chr like 'chr1 12783'
group by refalt;

每个样品的特定频率

1 个答案: