我有这个列表文件如下所示:
1 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTITIKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK V
2 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTITIKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK M
依此类推......第一列是数字,第二列对应蛋白质序列,第三列是最后一个字符,每个案例的相应序列中找到的模式。 因此,所需的输出将是这样的:
1:职位:4 23 43 53 56 65 68 91 92 100 120 123 125
2:职位:1 18 22 110 134
我尝试过使用awk和index函数。
nawk -F'\t' -v p=$3 'index($2,p) {printf "%s:positions:", NR; s=$2; m=0; while((n=index(s, p))>0) {m+=n; printf "%s ", m; s=substr(s, n+1)} print ""}' "file.tsv"
然而,它仅将变量-v指定为字符或字符串,但不指定$ 3。如何在unix环境中获取它?提前致谢
答案 0 :(得分:0)
你可以这样做:
awk -F'\t' '{ len=split($2,arr,""); printf "%s:positions:",$1 ; for(i=0;i<len;i++) { if(arr[i] == $3 ) { printf "%s ",i } }; print "" }' file.tsv
首先将主题$2
完全拆分为一个数组,然后循环它,检查$3
是否出现并在找到时打印数组索引
答案 1 :(得分:0)
Perl救援:
perl -wane '
print "$F[0]:positions:";
$i = 0;
print " ", $i while ($i = 1 + index $F[1], $F[2], $i) > 0;
print "\n";
' -- file
如果:
之后的空格有问题,可以将其复杂化为
$i = $f = 0;
$f = print " " x $f, $i while ($i = 1 + index $F[1], $F[2], $i) > 0;
答案 2 :(得分:0)
gawk 解决方案:
awk -v FPAT="[[:digit:]]+|[[:alpha:]]" '{
r=$1":positions:"; for(i=2;i<NF;i++) { if($i==$NF) r=r" "i-1 } print r
}' file.tsv
FPAT="[[:digit:]]+|[[:alpha:]]"
- 正则表达式模式定义字段值
for(i=2;i<NF;i++)
- 迭代字段(第2列的字母)
输出:
1:positions: 4 23 43 53 56 65 68 91 92 100 120 123 125
2:positions: 1 18 22 110 134
答案 3 :(得分:0)
awk '{
str=$1":positions:";
n=0;split($2,a,$3); # adopt $3 as the delimeter to split $2
for(i=1;i<length(a);i++){ # save the result to a
n+=length(a[i])+1;str=str" "n # locate the delimeter $3 by compute n+length(a[i])+1
}
print str
}' file.tsv
答案 4 :(得分:0)
$ awk '{out=$1 ":positions:"; for (i=1;i<=length($2);i++) { c=substr($2,i,1); if (c == $3) out = out " " i}; print out}' file
1:positions: 4 23 43 53 56 65 68 91 92 100 120 123 125
2:positions: 1 18 22 110 134
答案 5 :(得分:0)
简单的perl解决方案
use strict;
use warnings;
while( <DATA> ) {
chomp;
next if /^\s*$/; # just in case if you have empty line
my @data = split "\t"; # record is tabulated
my %result; # hash to store result
my $c = 0; # position in the string
map { $c++; push @{$result{$data[0]}}, $c if $_ eq $data[2] } split '', $data[1];
print "$data[0]:position:"
. join(' ', @{$result{$data[0]}}) # assemble result to desired form
. "\n";
}
__DATA__
1 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK V
2 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK M
答案 6 :(得分:-1)