我有一个表def find_bob(s):
check_list = 'bob'
c, n = 0, s.find(check_list)
while n != -1:
c += 1
n = s.find(check_list, n+1)
return c
In []:
find_bob('azcbobobegghakl')
Out[]:
2
,其中第二个和第三个字段snp150Common.txt
可以相等或不相等。
如果它们相同,我希望$2 and $3
成为$2
,以便:
$2-1
变为:
chr1 10177 10177 rs367896724 - - -/C insertion near-gene-5
chr1 10352 10352 rs555500075 - - -/A insertion near-gene-5
chr1 11007 11008 rs575272151 C C C/G single near-gene-5
chr1 11011 11012 rs544419019 C C C/G single near-gene-5
chr1 13109 13110 rs540538026 G G A/G single intron
chr1 13115 13116 rs62635286 T T G/T single intron
chr1 13117 13118 rs62028691 A A C/T single intron
chr1 13272 13273 rs531730856 G G C/G single ncRNA
chr1 14463 14464 rs546169444 A A A/T single near-gene-3,ncRNA
我当前的命令改编自https://askubuntu.com/a/312843:
chr1 10176 10177 rs367896724 - - -/C insertion near-gene-5
chr1 10351 10352 rs555500075 - - -/A insertion near-gene-5
chr1 11007 11008 rs575272151 C C C/G single near-gene-5
chr1 11011 11012 rs544419019 C C C/G single near-gene-5
chr1 13109 13110 rs540538026 G G A/G single intron
chr1 13115 13116 rs62635286 T T G/T single intron
chr1 13117 13118 rs62028691 A A C/T single intron
chr1 13272 13273 rs531730856 G G C/G single ncRNA
chr1 14463 14464 rs546169444 A A A/T single near-gene-3,ncRNA
给出相同的输出:
zcat < snp150/snp150Common.txt.gz | head | awk '{ if ($2 == $3) $2=$2-1; print $0 }' | cut -f 2,3,4,5,8,9,10,12,16
非常感谢任何帮助。
答案 0 :(得分:1)
这个答案是基于对源文件格式的纯粹推测:
$ zcat snp150/snp150Common.txt.gz |
awk '
BEGIN { OFS="\t" } # field separators are most likely tabs
{
if ($3 == $4) # based on cut these should be compared
$3=$3-1
print $2,$3,$4,$5,$8,$9,$10,$12,$16 # ... and there fields printed
}
NR==10 { exit }' # this replaces head
请记住:练习(除了吸吮之外的任何东西)会让你少吃。