我正在使用正则表达式:
>\.*<
匹配field3的某些部分,但我无法弄清楚如何用保留原始字符串长度的多个字符替换。
输入:
field1 field2 >>>>>.>............>>>.........<<<.......>>>>.......<<<<.<.<<<<<.
预期产出:
field1 field2 >>>>>.>............>>LLLLLLLLLLL<<.......>>>LLLLLLLLL<<<.<.<<<<<.
我糟糕的失败尝试:
awk 'match($3, />\.*</){split($3, sst, "");for(i=RSTART;i<=RLENGTH;i++){sst[i]="L"};joined=sep="";for(x=1; x in sst;x++){joined=joined sep sst[x];sep=""};printf("%s\n", joined)}' hg19-matRNA.tsv > test2.tsv
非常感谢任何帮助!
答案 0 :(得分:1)
使用GNU awk为第3个arg匹配()和gensub():
y
任何awk:
$ cat tst.awk
{
while ( match($3,/(.*)(>\.*<)(.*)/,a) ) {
$3 = a[1] gensub(/./,"L","g",a[2]) a[3]
}
print
}
$ awk -f tst.awk file
field1 field2 >>>>>.>............>>LLLLLLLLLLL<<.......>>>LLLLLLLLL<<<.<.<<<<<.
答案 1 :(得分:0)
awk 解决方案。您也可以像这样使用patsplit:
$ cat tst.awk
{
patsplit($3, a, ">\\.+<", seps)
l=(length(a)>length(seps)?length(a):length(seps))
for (i=0; i<l; i++){
if (i in a) gsub(/./,"L",a[i])
s=s sprintf("%s", (i in a)?a[i]seps[i]:seps[i])
}
$3=s
}1
$ awk -f tst.awk file
field1 field2 >>>>>.>............>>LLLLLLLLLLL<<.......>>>LLLLLLLLL<<<.<.<<<<<.