亲爱的stackoverflow社区,
我有一个2列的文件,如下所示:
Ccrux.00013.c0_g1_i1 .
Ccrux.00013.c0_g2_i1 .
Ccrux.00014.c0_g1_i1 .
Ccrux.00014.c0_g2_i1 .
Ccrux.00015.c0_g1_i1 .
Ccrux.00015.c0_g1_i1 GO:0005789^cellular_component^endoplasmic reticulum membrane`GO:0016021^cellular_component^integral component of membrane`GO:0005509^molecular_function^calcium ion binding`GO:0005506^molecular_function^iron ion binding`GO:0031418^molecular_function^L-ascorbic acid binding`GO:0016706^molecular_function^oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors`GO:0045646^biological_process^regulation of erythrocyte differentiation
Ccrux.00015.c0_g2_i1 GO:0005789^cellular_component^endoplasmic reticulum membrane`GO:0016021^cellular_component^integral component of membrane`GO:0005509^molecular_function^calcium ion binding`GO:0005506^molecular_function^iron ion binding`GO:0031418^molecular_function^L-ascorbic acid binding`GO:0016706^molecular_function^oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors`GO:0045646^biological_process^regulation of erythrocyte differentiation
Ccrux.00016.c0_g1_i1 .
Ccrux.00016.c0_g2_i1 .
Ccrux.00017.c0_g1_i1 .
Ccrux.00018.c0_g1_i1 .
Ccrux.00019.c0_g1_i1 .
我需要一个新的2列文件:
新文件应如下所示:
Ccrux.00015.c0_g1_i1 GO:0005789,GO:0016021,GO:0005509,GO:0005506,GO:0031418,GO:0016706,GO:0045646
Ccrux.00015.c0_g2_i1 GO:0005789,GO:0016021,GO:0005509,GO:0005506,GO:0031418,GO:0016706,GO:0045646
Ccrux.00029.c0_g1_i1 GO:0035869,GO:0005737,GO:0005615,GO:0016020,GO:0021956,GO:0060271,GO:0021904,GO:0001701,GO:0001841,GO:0008589,GO:0021523,GO:0021537
我一直在尝试使用perl:
perl -ne '/(GO:\d+)/ && print "$1"' input.file > output.file
但是只在一列中打印出所有GO号码。我真的迷失了怎么做。任何建议都将受到欢迎。
提前谢谢大家。
答案 0 :(得分:0)
你有什么模式匹配一段文字,然后打印出来。
你正在做的听起来:
GO:0005789^cellular_component^endoplasmic reticulum membrane`
您是否尝试删除^
与下一个GO
之间的任何“位”?
perl
的好处是语法-ne
只是在命令周围创建一个while
循环 - 所以它会让你做多个语句。
所以 - 扩展的例子:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
next unless m/GO/;
s/\^[^`]+`/,/g;
s/\^[^`]+$/\n/g;
print;
}
__DATA__
Ccrux.00013.c0_g1_i1 .
Ccrux.00013.c0_g2_i1 .
Ccrux.00014.c0_g1_i1 .
Ccrux.00014.c0_g2_i1 .
Ccrux.00015.c0_g1_i1 .
Ccrux.00015.c0_g1_i1 GO:0005789^cellular_component^endoplasmic reticulum membrane`GO:0016021^cellular_component^integral component of membrane`GO:0005509^molecular_function^calcium ion binding`GO:0005506^molecular_function^iron ion binding`GO:0031418^molecular_function^L-ascorbic acid binding`GO:0016706^molecular_function^oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors`GO:0045646^biological_process^regulation of erythrocyte differentiation
Ccrux.00015.c0_g2_i1 GO:0005789^cellular_component^endoplasmic reticulum membrane`GO:0016021^cellular_component^integral component of membrane`GO:0005509^molecular_function^calcium ion binding`GO:0005506^molecular_function^iron ion binding`GO:0031418^molecular_function^L-ascorbic acid binding`GO:0016706^molecular_function^oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors`GO:0045646^biological_process^regulation of erythrocyte differentiation
Ccrux.00016.c0_g1_i1 .
Ccrux.00016.c0_g2_i1 .
Ccrux.00017.c0_g1_i1 .
Ccrux.00018.c0_g1_i1 .
Ccrux.00019.c0_g1_i1 .
这将生成输出:
Ccrux.00015.c0_g1_i1 GO:0005789,GO:0016021,GO:0005509,GO:0005506,GO:0031418,GO:0016706,GO:0045646
Ccrux.00015.c0_g2_i1 GO:0005789,GO:0016021,GO:0005509,GO:0005506,GO:0031418,GO:0016706,GO:0045646
我们:
GO
的行。^
的任何实例,一个或多个不是^
,然后反引号。\n
行末尾终止的内容。 这样我们就可以凝聚成一个衬里:
perl -ne 'next unless m/GO/;s/\^[^`]+`/,/g;s/\^[^`]+$/\n/g;print' inputfile > outputfile
或者更好 - 如果不打印 - 请参阅perlrun
- -p
与-n
类似,但它会在print
中构建(因此更像{{} 1}})。
sed
答案 1 :(得分:0)
我认为您的要求对于单线解决方案来说有点太长了,但它可以非常简短。该程序将生成您描述的输出。它期望输入文件的路径作为命令行上的参数
use strict;
use warnings;
while ( <> ) {
next unless my @values = /GO:\d+/g;
local $" = ',';
s/\S\s+\K.+/@values/;
print;
}
单行版本会有点笨重
perl -pe '@v=/GO:\d+/g or next; $"=","; s/\S\s+\K.+/@v/; print;' myfile > newfile