如何根据数字删除行

时间:2019-02-11 23:22:53

标签: bash awk sed

我有这样的数据

>sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1
305
>sp|P13674|P4HA1_HUMAN Prolyl 4-hydroxylase subunit alpha-1 OS=Homo sapiens OX=9606 GN=P4HA1 PE=1 SV=2
534
>sp|Q7Z4N8|P4HA3_HUMAN Prolyl 4-hydroxylase subunit alpha-3 OS=Homo sapiens OX=9606 GN=P4HA3 PE=1 SV=1
544
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 OS=Homo sapiens OX=9606 GN=TP53 PE=1 SV=4
393
>sp|Q9UHX1|PUF60_HUMAN Poly(U)-binding-splicing factor PUF60 OS=Homo sapiens OX=9606 GN=PUF60 PE=1 SV=1
559
>sp|Q06416|P5F1B_HUMAN Putative POU domain, class 5, transcription factor 1B OS=Homo sapiens OX=9606 GN=POU5F1B PE=5 SV=2
359
>sp|O14683|P5I11_HUMAN Tumor protein p53-inducible protein 11 OS=Homo sapiens OX=9606 GN=TP53I11 PE=1 SV=2
189
>sp|Q14671|PUM1_HUMAN Pumilio homolog 1 OS=Homo sapiens OX=9606 GN=PUM1 PE=1 SV=3
1186

每行下面都有一个数字,我想确定那些数字小于350的行,并从中删除行

我想得到这样的输出

P13674 534
Q7Z4N8 544
P04637 393
Q9UHX1 559
Q06416 359
Q14671 1186

我可以尝试抓住||之间的字符串。但我无法删除

awk -F '[| ]' '/^>/ { print $3}' < data.txt

3 个答案:

答案 0 :(得分:2)

$ awk -F'|' '/^>/{v=$2; next} $0>=350{print v, $0}' file
P13674 534
Q7Z4N8 544
P04637 393
Q9UHX1 559
Q06416 359
Q14671 1186

答案 1 :(得分:1)

借助粘贴命令再次awk

paste - - < learner.txt  | awk -F"[|\t]" ' $NF>350 { print $2,$NF } '

具有给定的输入

$ cat learner.txt
>sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1
305
>sp|P13674|P4HA1_HUMAN Prolyl 4-hydroxylase subunit alpha-1 OS=Homo sapiens OX=9606 GN=P4HA1 PE=1 SV=2
534
>sp|Q7Z4N8|P4HA3_HUMAN Prolyl 4-hydroxylase subunit alpha-3 OS=Homo sapiens OX=9606 GN=P4HA3 PE=1 SV=1
544
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 OS=Homo sapiens OX=9606 GN=TP53 PE=1 SV=4
393
>sp|Q9UHX1|PUF60_HUMAN Poly(U)-binding-splicing factor PUF60 OS=Homo sapiens OX=9606 GN=PUF60 PE=1 SV=1
559
>sp|Q06416|P5F1B_HUMAN Putative POU domain, class 5, transcription factor 1B OS=Homo sapiens OX=9606 GN=POU5F1B PE=5 SV=2
359
>sp|O14683|P5I11_HUMAN Tumor protein p53-inducible protein 11 OS=Homo sapiens OX=9606 GN=TP53I11 PE=1 SV=2
189
>sp|Q14671|PUM1_HUMAN Pumilio homolog 1 OS=Homo sapiens OX=9606 GN=PUM1 PE=1 SV=3
1186

$ paste - - < learner.txt  | awk -F"[|\t]" ' $NF>350 { print $2,$NF } '
P13674 534
Q7Z4N8 544
P04637 393
Q9UHX1 559
Q06416 359
Q14671 1186

$

答案 2 :(得分:1)

您也可以尝试Perl

perl -F"\|" -lane ' /^(\d+)/ and $1>350 and print $p,"\t",$1; $p=$F[1] ' 

具有给定的输入

$ cat learner.txt
>sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1
305
>sp|P13674|P4HA1_HUMAN Prolyl 4-hydroxylase subunit alpha-1 OS=Homo sapiens OX=9606 GN=P4HA1 PE=1 SV=2
534
>sp|Q7Z4N8|P4HA3_HUMAN Prolyl 4-hydroxylase subunit alpha-3 OS=Homo sapiens OX=9606 GN=P4HA3 PE=1 SV=1
544
>sp|P04637|P53_HUMAN Cellular tumor antigen p53 OS=Homo sapiens OX=9606 GN=TP53 PE=1 SV=4
393
>sp|Q9UHX1|PUF60_HUMAN Poly(U)-binding-splicing factor PUF60 OS=Homo sapiens OX=9606 GN=PUF60 PE=1 SV=1
559
>sp|Q06416|P5F1B_HUMAN Putative POU domain, class 5, transcription factor 1B OS=Homo sapiens OX=9606 GN=POU5F1B PE=5 SV=2
359
>sp|O14683|P5I11_HUMAN Tumor protein p53-inducible protein 11 OS=Homo sapiens OX=9606 GN=TP53I11 PE=1 SV=2
189
>sp|Q14671|PUM1_HUMAN Pumilio homolog 1 OS=Homo sapiens OX=9606 GN=PUM1 PE=1 SV=3
1186

$ perl -F"\|" -lane ' /^(\d+)/ and $1>350 and print $p,"\t",$1; $p=$F[1] ' learner.txt
P13674  534
Q7Z4N8  544
P04637  393
Q9UHX1  559
Q06416  359
Q14671  1186

$