我想列出" my_data_path"的所有子目录中的所有数据文件。目录和那些文件应该匹配 - 第7列:与" mystring"匹配关键词 - 第20列:值为< = 0.01
似乎awk在第二个条件($ 20 <= 0.01)下无法正常工作,因为该列的值范围为0到1,但也包括&#34;。&#34;。我认为它可能会导致问题。 &#34;&#34;假设在数据文件中被视为0。因此,我怎样才能动态改变&#34;。&#34;在awk匹配期间为0?
这是我目前的版本:
找到my_data_path -type f -name&#39; * out.txt&#39; -exec awk -F&#34; \ t&#34; &#39; {if(($ 7 ==&#34; mystring&#34;)&amp;&amp;($ 20&lt; = 0.01)){print}}&#39; {} \;
样本数据如下:
chr1 69511 69511 A G exonic OR4F5 . nonsynonymous SNV OR4F5:NM_001005484:exon1:c.A421G:p.T141A Score=0.994828;Name=chr19:60000 . . . . . . rs2691305 1 0.9394
chr1 877831 877831 T C exonic SAMD11 . nonsynonymous SNV SAMD11:NM_152486:exon10:c.T1027C:p.W343R . . . . . . . rs6672356 1 0.9999
chr1 878667 878667 G T exonic SAMD11 . nonsynonymous SNV SAMD11:NM_152486:exon12:c.G1599T:p.E533D . . . . . . . rs201447515 0.003 8.74E-05
chr1 881627 881627 G A exonic NOC2L . synonymous SNV NOC2L:NM_015658:exon16:c.C1843T:p.L615L . . . . . . . rs2272757 0.66 0.5653
chr1 887801 887801 A G exonic NOC2L . synonymous SNV NOC2L:NM_015658:exon10:c.T1182C:p.T394T . . . . . . . rs3828047 0.96 0.9355
chr1 888639 888639 T C exonic NOC2L . synonymous SNV NOC2L:NM_015658:exon9:c.A918G:p.E306E . . . . . . . rs3748596 0.71 0.070
chr1 914333 914333 C G exonic PERM1 . nonsynonymous SNV PERM1:NM_001291366:exon2:c.G2077C:p.E693Q,PERM1:NM_001291367:exon3:c.G1795C:p.E599Q . . . . . . . rs13302979 0.81 0.6617
chr1 914852 914852 G C exonic PERM1 . nonsynonymous SNV PERM1:NM_001291366:exon2:c.C1558G:p.Q520E,PERM1:NM_001291367:exon3:c.C1276G:p.Q426E . . . . . . . rs13303368 0.71 0.595
chr1 914876 914876 T C exonic PERM1 . nonsynonymous SNV PERM1:NM_001291366:exon2:c.A1534G:p.S512G,PERM1:NM_001291367:exon3:c.A1252G:p.S418G . . . . . . . rs13302983 1 0.9664
chr1 914940 914940 T C exonic PERM1 . synonymous SNV PERM1:NM_001291366:exon2:c.A1470G:p.A490A,PERM1:NM_001291367:exon3:c.A1188G:p.A396A . . . . . . . rs13303033 0.71 0.5874
chr1 983473 983473 G T exonic AGRN . nonsynonymous SNV AGRN:NM_198576:exon23:c.G3833T:p.R1278L . . . . . . . rs542631667 0.0004 2.57E-05
chr1 984302 984302 T C exonic AGRN . synonymous SNV AGRN:NM_198576:exon24:c.T4161C:p.T1387T . Benign not_specified RCV000116269.2 MedGen CN169374 . rs9442391 0.84 0.6295
chr1 990280 990280 C T exonic AGRN . synonymous SNV AGRN:NM_198576:exon36:c.C6057T:p.D2019D . Benign not_specified RCV000116281.2 MedGen CN169374 . rs4275402 0.82 0.6376
chr1 1007203 1007203 A G exonic RNF223 . synonymous SNV RNF223:NM_001205252:exon2:c.T744C:p.D248D . . . . . . . rs4633229 0.92 0.8154
chr1 1007432 1007432 G A exonic RNF223 . nonsynonymous SNV RNF223:NM_001205252:exon2:c.C515T:p.A172V . . . . . . . rs4333796 0.8 0.5721
chr1 1147422 1147422 C T exonic TNFRSF4 . synonymous SNV TNFRSF4:NM_003327:exon5:c.G534A:p.E178E . . . . . . . rs17568 0.78 0.3751
chr1 1158631 1158631 A G exonic SDF4 . synonymous SNV SDF4:NM_016176:exon4:c.T570C:p.D190D,SDF4:NM_016547:exon4:c.T570C:p.D190D . . . . . . . rs6603781 1 0.9166
chr1 1220954 1220954 G A exonic SCNN1D . synonymous SNV SCNN1D:NM_001130413:exon6:c.G468A:p.S156S . . . . . . . rs12751100 . .
chr1 1222257 1222257 A C exonic SCNN1D . nonsynonymous SNV SCNN1D:NM_001130413:exon8:c.A1021C:p.T341P . . . . . . . . . .
因此,我希望:
SAMD11&lt; 0.01(第20列的值<0.01)
SCNN1D&lt; 0.01(由于第20列是&#34;。&#34; =&gt; 0)
NOC2L&lt; 0.01(由于第20列> 0.01)
请指教。谢谢!
答案 0 :(得分:0)
喜欢这个?:
首先要测试一下:
$ mkdir -p test/dir1 test/dir2
$ cat > test/dir1/good # pasted your sample file here
$ echo foo > test/dir2/bad # this wont match
然后解决方案:
$ awk '$7~/SCNN1D/ && $20<=0.01{print FILENAME;nextfile}' test/*/* 2>/dev/null
test/dir1/good
由于nextfile
,需要GNU awk。说明:
$ awk ' # awk has been assigned
$7~/SCNN1D/ && $20<=0.01 { # mystring is now SCNN1D
print FILENAME # on match output FILENAME
nextfile # and skip to next file
}' test/*/* 2>/dev/null # dirs under test/ cause output to stderr