Question

我有一个表def find_bob(s): check_list = 'bob' c, n = 0, s.find(check_list) while n != -1: c += 1 n = s.find(check_list, n+1) return c In []: find_bob('azcbobobegghakl') Out[]: 2，其中第二个和第三个字段snp150Common.txt可以相等或不相等。

如果它们相同，我希望$2 and $3成为$2，以便：

$2-1

变为：

chr1    10177   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10352   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

我当前的命令改编自https://askubuntu.com/a/312843：

chr1    10176   10177   rs367896724 -   -   -/C insertion   near-gene-5
chr1    10351   10352   rs555500075 -   -   -/A insertion   near-gene-5
chr1    11007   11008   rs575272151 C   C   C/G single      near-gene-5
chr1    11011   11012   rs544419019 C   C   C/G single      near-gene-5
chr1    13109   13110   rs540538026 G   G   A/G single      intron
chr1    13115   13116   rs62635286  T   T   G/T single      intron
chr1    13117   13118   rs62028691  A   A   C/T single      intron
chr1    13272   13273   rs531730856 G   G   C/G single      ncRNA
chr1    14463   14464   rs546169444 A   A   A/T single      near-gene-3,ncRNA

给出相同的输出：

zcat < snp150/snp150Common.txt.gz | head | awk '{ if ($2 == $3) $2=$2-1; print $0 }' | cut -f 2,3,4,5,8,9,10,12,16

非常感谢任何帮助。

Answer 1

这个答案是基于对源文件格式的纯粹推测：

$ zcat snp150/snp150Common.txt.gz | 
  awk '
  BEGIN { OFS="\t" }                       # field separators are most likely tabs
  {
      if ($3 == $4)                        # based on cut these should be compared
          $3=$3-1
      print $2,$3,$4,$5,$8,$9,$10,$12,$16  # ... and there fields printed
  }
  NR==10 { exit }'                         # this replaces head

请记住：练习（除了吸吮之外的任何东西）会让你少吃。

awk：根据另一列的值有条件地更改字段的值

1 个答案: