使用awk解析

时间:2013-01-10 06:50:30

标签: parsing awk

如何使用awk基于来自另一个文件的数据解析文件。

我做了一个剧本:

BEGIN{ FS="\t" ; OFS="\t"

while((getline<"headfpkm")>0) {
        ++a
        id[a]=$1
        fpkm[a]=$2
        print id[a],fpkm[a]
        }
lastid=id[a]
print lastid
close("headfpkm")
}

/$lastid/{
        print $2,$3,$5,$7,$8,$14,fpkm[a]
        a--
        lastid=id[a]
}
END{ print "total lines=",FNR,"\n\nfile 1 index: ",a}

当我运行它时:

/$ awk -f testawk.awk file2

它正确运行BEGIN部分,但不提供任何输出。

NM_000014       5.04503
NM_000015       0.586677
NM_000016       1.138332278
NM_000017       0.64386
NM_000018       3.61746
NM_000019       2.8793
NM_000020       10.846
NM_000021       0.685098
NM_000022       46388.6
NM_000026       0.257471
NM_000026
total lines=    10

file 1 index:   10

搜索部分有什么问题吗?

文件2看起来像这样:

34      ACADM   NM_000016       9606    hsa-miR-3148    3       80      87      0.003   -0.016  -0.094  0.082   0.112   -0.160  97
34      ACADM   NM_000016       9606    hsa-miR-3163    1       623     629     0.001   -0.022  -0.020  0.065   0.125   -0.01   57
35      ACADS   NM_000017       9606    hsa-miR-3921    3       68      75      0.013   0.192   -0.097  0.031   -0.039  -0.147  82
35      ACADS   NM_000017       9606    hsa-miR-4303    2       67      73      0.012   0.150   -0.052  0.013   -0.039  -0.036  31
35      ACADS   NM_000017       9606    hsa-miR-4653-5p 3       68      75      0.003   0.192   -0.097  0.031   -0.039  -0.157  84
37      ACADVL  NM_000018       9606    hsa-miR-124     2       31      37      0.003   0.023   -0.057  0.012   -0.032  -0.171  76
37      ACADVL  NM_000018       9606    hsa-miR-1827    2       135     141     -0.007  -0.043  -0.058  0.039   -0.069  -0.258  91
37      ACADVL  NM_000018       9606    hsa-miR-2682    2       134     140     0.003   -0.014  -0.058  0.004   -0.047  -0.232  87
37      ACADVL  NM_000018       9606    hsa-miR-449c    2       134     140     -0.035  -0.014  -0.058  0.004   -0.047  -0.270  92
37      ACADVL  NM_000018       9606    hsa-miR-506     2       31      37      -0.016  0.023   -0.057  0.012   -0.032  -0.190  80

1 个答案:

答案 0 :(得分:3)

这将是一个猜测,因为我不是100%确定你想要完成什么。解决问题的更好方法是做这样的事情:

BEGIN {
    FS=OFS="\t"
}

FNR==NR {
    c++

    a[$1]=$2
    next
}

$3 in a {
    print $2,$3,$5,$7,$8,$14,a[$3]
}

END {
    printf "total lines=%s\n\nfile 1 index: %s\n", FNR, c
}

运行如:

awk -f script.awk headfpkm file2

结果:

ACADM   NM_000016  hsa-miR-3148     80   87   -0.160  1.138332278
ACADM   NM_000016  hsa-miR-3163     623  629  -0.01   1.138332278
ACADS   NM_000017  hsa-miR-3921     68   75   -0.147  0.64386
ACADS   NM_000017  hsa-miR-4303     67   73   -0.036  0.64386
ACADS   NM_000017  hsa-miR-4653-5p  68   75   -0.157  0.64386
ACADVL  NM_000018  hsa-miR-124      31   37   -0.171  3.61746
ACADVL  NM_000018  hsa-miR-1827     135  141  -0.258  3.61746
ACADVL  NM_000018  hsa-miR-2682     134  140  -0.232  3.61746
ACADVL  NM_000018  hsa-miR-449c     134  140  -0.270  3.61746
ACADVL  NM_000018  hsa-miR-506      31   37   -0.190  3.61746
total lines=10

file 1 index: 10