我有一个充满这样的行的文件:
>Mouse|chr9:95713136-95716028 | element 1367 | positive | hindbrain (rhombencephalon)[5/8] | midbrain (mesencephalon)[3/8] | other[7/8]
>Mouse|chr16:90449561-90451327 | element 1672 | positive | forebrain[4/8] | heart[6/8]
>Mouse|chr3:137446183-137449401 | element 4 | positive | heart[3/4]
我想得的是这样的:
Mouse chr9 95713136 95716028 element 1367 positive hindbrain (rhombencephalon)[5/8]|midbrain (mesencephalon)[3/8]|other[7/8]
这样所有的词语在" positive"之后在一个由管道分隔的列中,所有列都由制表符分隔。 这就是我所做的:
sed -E 's/ *[>\|:-] */\t/g' mouse_genome_vista1.txt > mouse_genome_vista2.txt
sed "s/^[ \t]*//" -i mouse_genome_vista2.txt
我的输出是这样的:
Mouse chr9 95713136 95716028 element 1367 positive hindbrain (rhombencephalon)[5/8] midbrain (mesencephalon)[3/8] other[7/8]
Mouse chr16 90449561 90451327 element 1672 positive forebrain[4/8] heart[6/8]
Mouse chr3 137446183 137449401 element 4 positive heart[3/4]
如果我在"肯定"之后只有一个单词,那就有效。它将在其专栏中独自一人。但是,如果我有多个列,我会有多个列。例如,hindbrain,midbrain和其他每个都在它们自己的制表符分隔列中,我希望它们在一列中被分开管道。
答案 0 :(得分:0)
您可以使用perl或awk尝试此操作:
[|:-](?=.*positive)|positive\s+\K\|
Sample Perl Solution(注意它说明了一组字符串而不是文件):
use strict;
my $str = 'Mouse|chr9:95713136-95716028 | element 1367 | positive | hindbrain (rhombencephalon)[5/8] | midbrain (mesencephalon)[3/8] | other[7/8]
Mouse|chr16:90449561-90451327 | element 1672 | positive | forebrain[4/8] | heart[6/8]
Mouse|chr3:137446183-137449401 | element 4 | positive | heart[3/4]
';
my $regex = qr/[|:-](?=.*positive)|positive\s+\K\|/xmp;
my $subst = '\\t';
my $result = $str =~ s/$regex/$subst/rg;
print $result;