如果字段波纹管为空,则重复字段

时间:2019-06-18 19:42:04

标签: bash awk

我有一个制表符delim文件

1A      21      .        SMO     gene_start
1A      3940    .        SMO     gene_end
1A      52236   .        LOC105758527    gene_start
1A      55001   0.469590
1A      65001   0.067909
1A      75001   0.220712
1A      78812   .        LOC105758527    gene_end
1A      79831   .        LOC100218126    gene_start
1A      85001   0.174872
1A      93700   .        LOC100218126    gene_end
1A      96312   .        LOC105758528    gene_start
1A      98792   .        LOC105758528    gene_end
1A      115136  .        LOC105758529    gene_start
1A      125001  0.023420
1A      126187  .        LOC105758529    gene_end

...

,我需要在第4列中的空白处重复上面的内容。

1A      21      .        SMO     gene_start
1A      3940    .        SMO     gene_end
1A      52236   .        LOC105758527    gene_start
1A      55001   0.469590 LOC105758527
1A      65001   0.067909 LOC105758527
1A      75001   0.220712 LOC105758527
1A      78812   .        LOC105758527    gene_end
1A      79831   .        LOC100218126    gene_start
1A      85001   0.174872 LOC100218126
1A      93700   .        LOC100218126    gene_end
1A      96312   .        LOC105758528    gene_start
1A      98792   .        LOC105758528    gene_end
1A      115136  .        LOC105758529    gene_start
1A      125001  0.023420 LOC105758529
1A      126187  .        LOC105758529    gene_end

...

我在做

awk 'NF==5{v=$4;print} NF==3{print v,$0}' file

但是我得到了

1A      21      .       SMO     gene_start
1A      3940    .       SMO     gene_end
1A      52236   .       LOC105758527    gene_start
LOC105758527 1A 55001   0.469590
LOC105758527 1A 65001   0.067909
LOC105758527 1A 75001   0.220712
1A      78812   .       LOC105758527    gene_end
1A      79831   .       LOC100218126    gene_start
LOC100218126 1A 85001   0.174872
1A      93700   .       LOC100218126    gene_end
1A      96312   .       LOC105758528    gene_start
1A      98792   .       LOC105758528    gene_end
1A      115136  .       LOC105758529    gene_start
LOC105758529 1A 125001  0.023420
1A      126187  .       LOC105758529    gene_end

不知道要更改什么

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ...

3 个答案:

答案 0 :(得分:1)

只需将存储的值分配给“ 缺少”字段:

awk 'NF==5{v=$4}NF==3{$4=v}1' OFS="\t" file

最后的1只是避免使用print语句的快捷方式。

awk中,当条件匹配时,默认操作是打印输入行。

$ echo "test" |awk '1'

等效于:

echo "test"|awk '1==1'

echo "test"|awk '{if (1==1){print}}'

这是因为1将始终为真

答案 1 :(得分:1)

类似这样:

function createGrid(rowCount, columnCount) {
    for (let x = 0; x < rowCount; x++) {
        for (let y = 0; y < columnCount; y++) {
            cell(x, y); 
        }
    }
}

function cell(x, y) {
    grid[x] = grid[x] || [];
    grid[x][y] = x + "|" + y;
}

var grid = [];
createGrid(5, 5);

或更短一些:

awk '!$4 {$0=$0 FS t} {t=$4} 1' "OFS=\t" file
1A      21      .        SMO     gene_start
1A      3940    .        SMO     gene_end
1A      52236   .        LOC105758527    gene_start
1A      55001   0.469590 LOC105758527
1A      65001   0.067909 LOC105758527
1A      75001   0.220712 LOC105758527
1A      78812   .        LOC105758527    gene_end
1A      79831   .        LOC100218126    gene_start
1A      85001   0.174872 LOC100218126
1A      93700   .        LOC100218126    gene_end
1A      96312   .        LOC105758528    gene_start
1A      98792   .        LOC105758528    gene_end
1A      115136  .        LOC105758529    gene_start
1A      125001  0.023420 LOC105758529
1A      126187  .        LOC105758529    gene_end

是否存在不存在的问题:否,将awk '!$4?$0=$0FS t:t=$4' OFS="\t" file 存储到$4
是否存在不存在的内容:是,将t添加到行

胡安斯版本的更短版本

t

答案 2 :(得分:1)

假定所有字段分隔符选项卡都存在,即使这些字段为空(如任何普通CSV或TSV文件一样):

$ awk 'BEGIN{FS=OFS="\t"} $4==""{$4=prev} {prev=$4} 1' file
1A      21      .       SMO     gene_start
1A      3940    .       SMO     gene_end
1A      52236   .       LOC105758527    gene_start
1A      55001   0.469590        LOC105758527
1A      65001   0.067909        LOC105758527
1A      75001   0.220712        LOC105758527
1A      78812   .       LOC105758527    gene_end
1A      79831   .       LOC100218126    gene_start
1A      85001   0.174872        LOC100218126
1A      93700   .       LOC100218126    gene_end
1A      96312   .       LOC105758528    gene_start
1A      98792   .       LOC105758528    gene_end
1A      115136  .       LOC105758529    gene_start
1A      125001  0.023420        LOC105758529
1A      126187  .       LOC105758529    gene_end