Question

我希望在awk中提供一些帮助，以便从两个输入文件的产品中创建一个文件。

文件1具有850,000行和50,001列SNP数据。第一列是id

文件1中3行的示例，其中包含id和前4个SNP

A 1 2 1 2   
B 2 2 2 1  
C 1 1 1 1

文件2有1行50,000 SNP效果。

0.2 -0.1 0.4 0.5

我想要的输出是id和每个SNP的总和乘以SNP效应，即

A将是1*0.2 + 2*-0.1 + 1*0.4 + 2*0.5 = 1.4

A 1.4
B 1.5
C 1

任何帮助都将不胜感激。

罗迪

Answer 1

这个awk单行应该适合你：

 awk 'NR==FNR{split($0,a);next}{s=0;for(i=2;i<=NF;i++)s+=a[i-1]*$i;print $1,s}' file2 file1

Answer 2

您可以使用以下awk脚本：

awk 'FNR==NR{split($0,a);next}{t=0;for(i=2;i<=NF;i++){t+=$i*a[i-1]};print $1,t}' b.txt a.txt

作为多线版本具有更好的可读性：

calc.awk

# True for the first input file (the one with the factors)
# See: https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#Auto_002dset
FNR==NR{
    # split factors into array a  
    split($0,a)
    next
}
{
    t=0 # total
    # Iterate through fields
    for(i=2;i<=NF;i++){
        # ... and aggregate t 
        t+=$i*a[i-1]
    }
    print $1,t # Output the id along with t
}

这样称呼：

awk -f calc.awk b.txt a.txt

使用两个文件的乘法生成在awk中创建文件

2 个答案: