Question

我有两个文件。首先包含所有样品的名称，数字和天数 sam_name.csv

Number,Day,Sample
171386,0,38_171386_D0_2-1.raw
171386,0,38_171386_D0_2-2.raw
171386,2,30_171386_D2_1-1.raw
171386,2,30_171386_D2_1-2.raw
171386,-1,40_171386_D-1_1-1.raw
171386,-1,40_171386_D-1_1-2.raw

第二个包含有关批次的信息（最后一栏） sam_batch.csv

Number,Day,Quar,Code,M.F,Status,Batch
171386,0,1,x,F,C,1
171386,1,1,x,F,C,2
171386,2,1,x,F,C,5
171386,-1,1,x,F,C,6

我想获取有关批次的信息（使用两个条件编号和日期）并将其添加到第一个文件中。我使用awk命令来做到这一点，但我只在一次性点（-1）得到结果。

这是我的命令：

awk -F"," 'NR==FNR{number[$1]=$1;day[$1]=$2;batch[$1]=$7; next}{if($1==number[$1] && $2==day[$1]){print $0 "," number[$1] "," day[$1] "," batch[$1]}}' sam_batch.csv sam_nam.csv

以下是我的结果:(文件sam_name，文件sam_batch中的数字和日期（仅用于检查条件是否正常）和批号（我需要的值）

Number,Day,Sample,Number,Day, Batch
171386,-1,40_171386_D-1_1-1.raw,171386,-1,6
171386,-1,40_171386_D-1_1-2.raw,171386,-1,6
175618,-1,08_175618_D-1_1-1.raw,175618,-1,2

Answer 1

Here I corrected your AWK code:

awk -F"," 'NR==FNR{
    number_day = $1 FS $2 
    batch[number_day]=$7 
    next
}
{
    number_day = $1 FS $2
    print $0 "," batch[number_day]
}' sam_batch.csv sam_name.csv

Output:

Number,Day,Sample,Batch
171386,0,38_171386_D0_2-1.raw,1
171386,0,38_171386_D0_2-2.raw,1
171386,2,30_171386_D2_1-1.raw,5
171386,2,30_171386_D2_1-2.raw,5
171386,-1,40_171386_D-1_1-1.raw,6
171386,-1,40_171386_D-1_1-2.raw,6

(No need for double-checking if you understand how the script works.)

Here's another AWK solution (my original answer):

awk -v "b=sam_batch.csv" 'BEGIN {
    FS=OFS=","
    while(( getline line < b) > 0) {
        n = split(line,a)
        nd = a[1] FS a[2]
        nd2b[nd] = a[n]
    }
}
{ print $1,$2,$3,nd2b[$1 FS $2] }' sam_name.csv

Both solutions parse file sam_batch.csv at the beginning to form a dictionary of (number, day) -> batch. Then they parse sam_name.csv, printing out the first three fields together with the "Batch" from another file.

使用带有if条件的awk分析两个文件

1 个答案: