将包含多个数据条目的行扩展为每个包含一个数据的独立行

时间:2019-02-03 16:38:11

标签: bash parsing text awk grep

我有一个文件,其中第一列是一个标识符,其余各行包含零至多个数字,并用单个空格分隔。

例如:

SOAP.k35.scaffold280 0003723 
SOAP.k35.scaffold421 
SOAP.k35.scaffold429 0004930 0016021
TRINITY_DN23171_c1_g1_i2 0006457 0005509 0030246 0051082 0005788
SOAP.k35.scaffold599 0007411 0033627 0035001 0016321 0007507 0035011 0007498 0045886 0030155 0030334 0045995 0034446 0005102 0030424 0005604 0030054 0036062 0008021

我希望每个行末尾的数字条目都在其自己的行上带有适当的第一列标识符(即SOAP ...或TRINITY ....),并在每行的开头加上在每个第一列标识符和该行的给定数字。我还想删除第一个列标识符之后不包含数字的行。

作为上述处理后的文本的结果的示例:

SOAP.k35.scaffold280 = 0003723
SOAP.k35.scaffold429 = 0004930
SOAP.k35.scaffold429 = 0016021
TRINITY_DN23171_c1_g1_i2 = 0006457
TRINITY_DN23171_c1_g1_i2 = 0005509
TRINITY_DN23171_c1_g1_i2 = 0030246

... 等等。

我的主要问题是知道如何存储第一列标识符,以在通过数字数据条目解析行时插入要插入的任何新行字符之前。

非常感谢您的帮助。

3 个答案:

答案 0 :(得分:1)

请您尝试以下。

awk '(/^SOAP/ || /^TRINITY/){for(i=2;i<=NF;i++){print $1" = "$i}}' Input_file

如果您不想仅对以字符串awkSOAP开头的行严格执行TRINITY编程,然后尝试执行以下操作。

awk '{for(i=2;i<=NF;i++){print $1" = "$i}}' Input_file

输出如下。

SOAP.k35.scaffold280 = 0003723
SOAP.k35.scaffold429 = 0004930
SOAP.k35.scaffold429 = 0016021
TRINITY_DN23171_c1_g1_i2 = 0006457
TRINITY_DN23171_c1_g1_i2 = 0005509
TRINITY_DN23171_c1_g1_i2 = 0030246
TRINITY_DN23171_c1_g1_i2 = 0051082
TRINITY_DN23171_c1_g1_i2 = 0005788
SOAP.k35.scaffold599 = 0007411
SOAP.k35.scaffold599 = 0033627
SOAP.k35.scaffold599 = 0035001
SOAP.k35.scaffold599 = 0016321
SOAP.k35.scaffold599 = 0007507
SOAP.k35.scaffold599 = 0035011
SOAP.k35.scaffold599 = 0007498
SOAP.k35.scaffold599 = 0045886
SOAP.k35.scaffold599 = 0030155
SOAP.k35.scaffold599 = 0030334
SOAP.k35.scaffold599 = 0045995
SOAP.k35.scaffold599 = 0034446
SOAP.k35.scaffold599 = 0005102
SOAP.k35.scaffold599 = 0030424
SOAP.k35.scaffold599 = 0005604
SOAP.k35.scaffold599 = 0030054
SOAP.k35.scaffold599 = 0036062
SOAP.k35.scaffold599 = 0008021

答案 1 :(得分:1)

简单

$ awk '{for(i=2;i<=NF;i++) print $1,"=",$i}' file

SOAP.k35.scaffold280 = 0003723
SOAP.k35.scaffold429 = 0004930
SOAP.k35.scaffold429 = 0016021
TRINITY_DN23171_c1_g1_i2 = 0006457
TRINITY_DN23171_c1_g1_i2 = 0005509
TRINITY_DN23171_c1_g1_i2 = 0030246
TRINITY_DN23171_c1_g1_i2 = 0051082
TRINITY_DN23171_c1_g1_i2 = 0005788
...

答案 2 :(得分:0)

您也可以尝试Perl

$ perl -ne ' ($x)=$_=~m/(^\S+)/; while( /\s(\d+)/g ) { print "$x = $1\n" } ' scottc.txt
SOAP.k35.scaffold280 = 0003723
SOAP.k35.scaffold429 = 0004930
SOAP.k35.scaffold429 = 0016021
TRINITY_DN23171_c1_g1_i2 = 0006457
TRINITY_DN23171_c1_g1_i2 = 0005509
TRINITY_DN23171_c1_g1_i2 = 0030246
TRINITY_DN23171_c1_g1_i2 = 0051082
TRINITY_DN23171_c1_g1_i2 = 0005788
SOAP.k35.scaffold599 = 0007411
SOAP.k35.scaffold599 = 0033627
SOAP.k35.scaffold599 = 0035001
SOAP.k35.scaffold599 = 0016321
SOAP.k35.scaffold599 = 0007507
SOAP.k35.scaffold599 = 0035011
. . . . . 
. . . . .