我有一个制表符delim文件
1A 21 . SMO gene_start
1A 3940 . SMO gene_end
1A 52236 . LOC105758527 gene_start
1A 55001 0.469590
1A 65001 0.067909
1A 75001 0.220712
1A 78812 . LOC105758527 gene_end
1A 79831 . LOC100218126 gene_start
1A 85001 0.174872
1A 93700 . LOC100218126 gene_end
1A 96312 . LOC105758528 gene_start
1A 98792 . LOC105758528 gene_end
1A 115136 . LOC105758529 gene_start
1A 125001 0.023420
1A 126187 . LOC105758529 gene_end
...
,我需要在第4列中的空白处重复上面的内容。
1A 21 . SMO gene_start
1A 3940 . SMO gene_end
1A 52236 . LOC105758527 gene_start
1A 55001 0.469590 LOC105758527
1A 65001 0.067909 LOC105758527
1A 75001 0.220712 LOC105758527
1A 78812 . LOC105758527 gene_end
1A 79831 . LOC100218126 gene_start
1A 85001 0.174872 LOC100218126
1A 93700 . LOC100218126 gene_end
1A 96312 . LOC105758528 gene_start
1A 98792 . LOC105758528 gene_end
1A 115136 . LOC105758529 gene_start
1A 125001 0.023420 LOC105758529
1A 126187 . LOC105758529 gene_end
...
我在做
awk 'NF==5{v=$4;print} NF==3{print v,$0}' file
但是我得到了
1A 21 . SMO gene_start
1A 3940 . SMO gene_end
1A 52236 . LOC105758527 gene_start
LOC105758527 1A 55001 0.469590
LOC105758527 1A 65001 0.067909
LOC105758527 1A 75001 0.220712
1A 78812 . LOC105758527 gene_end
1A 79831 . LOC100218126 gene_start
LOC100218126 1A 85001 0.174872
1A 93700 . LOC100218126 gene_end
1A 96312 . LOC105758528 gene_start
1A 98792 . LOC105758528 gene_end
1A 115136 . LOC105758529 gene_start
LOC105758529 1A 125001 0.023420
1A 126187 . LOC105758529 gene_end
不知道要更改什么
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ...
答案 0 :(得分:1)
只需将存储的值分配给“ 缺少”字段:
awk 'NF==5{v=$4}NF==3{$4=v}1' OFS="\t" file
最后的1
只是避免使用print语句的快捷方式。
在awk
中,当条件匹配时,默认操作是打印输入行。
$ echo "test" |awk '1'
等效于:
echo "test"|awk '1==1'
echo "test"|awk '{if (1==1){print}}'
这是因为1
将始终为真
答案 1 :(得分:1)
类似这样:
function createGrid(rowCount, columnCount) {
for (let x = 0; x < rowCount; x++) {
for (let y = 0; y < columnCount; y++) {
cell(x, y);
}
}
}
function cell(x, y) {
grid[x] = grid[x] || [];
grid[x][y] = x + "|" + y;
}
var grid = [];
createGrid(5, 5);
或更短一些:
awk '!$4 {$0=$0 FS t} {t=$4} 1' "OFS=\t" file
1A 21 . SMO gene_start
1A 3940 . SMO gene_end
1A 52236 . LOC105758527 gene_start
1A 55001 0.469590 LOC105758527
1A 65001 0.067909 LOC105758527
1A 75001 0.220712 LOC105758527
1A 78812 . LOC105758527 gene_end
1A 79831 . LOC100218126 gene_start
1A 85001 0.174872 LOC100218126
1A 93700 . LOC100218126 gene_end
1A 96312 . LOC105758528 gene_start
1A 98792 . LOC105758528 gene_end
1A 115136 . LOC105758529 gene_start
1A 125001 0.023420 LOC105758529
1A 126187 . LOC105758529 gene_end
是否存在不存在的问题:否,将awk '!$4?$0=$0FS t:t=$4' OFS="\t" file
存储到$4
是否存在不存在的内容:是,将t
添加到行
胡安斯版本的更短版本
t
答案 2 :(得分:1)
假定所有字段分隔符选项卡都存在,即使这些字段为空(如任何普通CSV或TSV文件一样):
$ awk 'BEGIN{FS=OFS="\t"} $4==""{$4=prev} {prev=$4} 1' file
1A 21 . SMO gene_start
1A 3940 . SMO gene_end
1A 52236 . LOC105758527 gene_start
1A 55001 0.469590 LOC105758527
1A 65001 0.067909 LOC105758527
1A 75001 0.220712 LOC105758527
1A 78812 . LOC105758527 gene_end
1A 79831 . LOC100218126 gene_start
1A 85001 0.174872 LOC100218126
1A 93700 . LOC100218126 gene_end
1A 96312 . LOC105758528 gene_start
1A 98792 . LOC105758528 gene_end
1A 115136 . LOC105758529 gene_start
1A 125001 0.023420 LOC105758529
1A 126187 . LOC105758529 gene_end