Question

我想列出＆＃34; my_data_path＆＃34;的所有子目录中的所有数据文件。目录和那些文件应该匹配 - 第7列：与＆＃34; mystring＆＃34;匹配关键词 - 第20列：值为＆lt; = 0.01

似乎awk在第二个条件（$ 20 <= 0.01）下无法正常工作，因为该列的值范围为0到1，但也包括＆＃34;。＆＃34;。我认为它可能会导致问题。＆＃34;＆＃34;假设在数据文件中被视为0。因此，我怎样才能动态改变＆＃34;。＆＃34;在awk匹配期间为0？

这是我目前的版本：

找到my_data_path -type f -name＆＃39; * out.txt＆＃39; -exec awk -F＆＃34; \ t＆＃34; ＆＃39; {if（（$ 7 ==＆＃34; mystring＆＃34;）＆amp;＆amp;（$ 20＆lt; = 0.01））{print}}＆＃39; {} \;

样本数据如下：

chr1    69511   69511   A   G   exonic  OR4F5   .   nonsynonymous SNV   OR4F5:NM_001005484:exon1:c.A421G:p.T141A    Score=0.994828;Name=chr19:60000 .   .   .   .   .   .   rs2691305   1   0.9394
chr1    877831  877831  T   C   exonic  SAMD11  .   nonsynonymous SNV   SAMD11:NM_152486:exon10:c.T1027C:p.W343R    .   .   .   .   .   .   .   rs6672356   1   0.9999
chr1    878667  878667  G   T   exonic  SAMD11  .   nonsynonymous SNV   SAMD11:NM_152486:exon12:c.G1599T:p.E533D    .   .   .   .   .   .   .   rs201447515 0.003   8.74E-05
chr1    881627  881627  G   A   exonic  NOC2L   .   synonymous SNV  NOC2L:NM_015658:exon16:c.C1843T:p.L615L .   .   .   .   .   .   .   rs2272757   0.66    0.5653
chr1    887801  887801  A   G   exonic  NOC2L   .   synonymous SNV  NOC2L:NM_015658:exon10:c.T1182C:p.T394T .   .   .   .   .   .   .   rs3828047   0.96    0.9355
chr1    888639  888639  T   C   exonic  NOC2L   .   synonymous SNV  NOC2L:NM_015658:exon9:c.A918G:p.E306E   .   .   .   .   .   .   .   rs3748596   0.71    0.070
chr1    914333  914333  C   G   exonic  PERM1   .   nonsynonymous SNV   PERM1:NM_001291366:exon2:c.G2077C:p.E693Q,PERM1:NM_001291367:exon3:c.G1795C:p.E599Q .   .   .   .   .   .   .   rs13302979  0.81    0.6617
chr1    914852  914852  G   C   exonic  PERM1   .   nonsynonymous SNV   PERM1:NM_001291366:exon2:c.C1558G:p.Q520E,PERM1:NM_001291367:exon3:c.C1276G:p.Q426E .   .   .   .   .   .   .   rs13303368  0.71    0.595
chr1    914876  914876  T   C   exonic  PERM1   .   nonsynonymous SNV   PERM1:NM_001291366:exon2:c.A1534G:p.S512G,PERM1:NM_001291367:exon3:c.A1252G:p.S418G .   .   .   .   .   .   .   rs13302983  1   0.9664
chr1    914940  914940  T   C   exonic  PERM1   .   synonymous SNV  PERM1:NM_001291366:exon2:c.A1470G:p.A490A,PERM1:NM_001291367:exon3:c.A1188G:p.A396A .   .   .   .   .   .   .   rs13303033  0.71    0.5874
chr1    983473  983473  G   T   exonic  AGRN    .   nonsynonymous SNV   AGRN:NM_198576:exon23:c.G3833T:p.R1278L .   .   .   .   .   .   .   rs542631667 0.0004  2.57E-05
chr1    984302  984302  T   C   exonic  AGRN    .   synonymous SNV  AGRN:NM_198576:exon24:c.T4161C:p.T1387T .   Benign  not_specified   RCV000116269.2  MedGen  CN169374    .   rs9442391   0.84    0.6295
chr1    990280  990280  C   T   exonic  AGRN    .   synonymous SNV  AGRN:NM_198576:exon36:c.C6057T:p.D2019D .   Benign  not_specified   RCV000116281.2  MedGen  CN169374    .   rs4275402   0.82    0.6376
chr1    1007203 1007203 A   G   exonic  RNF223  .   synonymous SNV  RNF223:NM_001205252:exon2:c.T744C:p.D248D   .   .   .   .   .   .   .   rs4633229   0.92    0.8154
chr1    1007432 1007432 G   A   exonic  RNF223  .   nonsynonymous SNV   RNF223:NM_001205252:exon2:c.C515T:p.A172V   .   .   .   .   .   .   .   rs4333796   0.8 0.5721
chr1    1147422 1147422 C   T   exonic  TNFRSF4 .   synonymous SNV  TNFRSF4:NM_003327:exon5:c.G534A:p.E178E .   .   .   .   .   .   .   rs17568 0.78    0.3751
chr1    1158631 1158631 A   G   exonic  SDF4    .   synonymous SNV  SDF4:NM_016176:exon4:c.T570C:p.D190D,SDF4:NM_016547:exon4:c.T570C:p.D190D   .   .   .   .   .   .   .   rs6603781   1   0.9166
chr1    1220954 1220954 G   A   exonic  SCNN1D  .   synonymous SNV  SCNN1D:NM_001130413:exon6:c.G468A:p.S156S   .   .   .   .   .   .   .   rs12751100  .   .
chr1    1222257 1222257 A   C   exonic  SCNN1D  .   nonsynonymous SNV   SCNN1D:NM_001130413:exon8:c.A1021C:p.T341P  .   .   .   .   .   .   .   .   .   .

因此，我希望：

如果我这样搜索，则可以找到该数据文件的文件名：

SAMD11＆lt; 0.01（第20列的值<0.01）

SCNN1D＆lt; 0.01（由于第20列是＆＃34;。＆＃34; =＆gt; 0）

该数据文件的文件名就可以找到：

NOC2L＆lt; 0.01（由于第20列> 0.01）

请指教。谢谢！

Answer 1

喜欢这个？：

首先要测试一下：

$ mkdir -p test/dir1 test/dir2
$ cat > test/dir1/good          # pasted your sample file here
$ echo foo > test/dir2/bad      # this wont match

然后解决方案：

$ awk '$7~/SCNN1D/ && $20<=0.01{print FILENAME;nextfile}' test/*/* 2>/dev/null
test/dir1/good

由于nextfile，

需要GNU awk。说明：

$ awk '                      # awk has been assigned
$7~/SCNN1D/ && $20<=0.01 {   # mystring is now SCNN1D
    print FILENAME           # on match output FILENAME
    nextfile                 # and skip to next file
}' test/*/* 2>/dev/null      # dirs under test/ cause output to stderr

搜索带有数字和＆＃34;的列。＆＃34;应该被算术运算符视为0？

1 个答案: