根据第一列拆分第二列

时间:2018-07-04 12:34:50

标签: linux

我有两列是这样的:

cluster22717    GO:0005737,GO:0007049,GO:0051301

如何将其转换为此:

cluster22717    GO:0005737
cluster22717    GO:0007049
cluster22717    GO:0051301

我还要提到的是,这是文件中的一行,其中包含数千行,而第二列的元素数却不同。 提前致谢, Pezhman Safdari

2 个答案:

答案 0 :(得分:1)

最简单的解决方案是使用一些循环,请参见下面的示例

输入文件:sample.txt

cluster22717    GO:0005737,GO:0007049,GO:0051301
cluster22717    GO:0005738,GO:0007041,GO:0051304,GO:0051307
cluster22717    GO:0005739,GO:0007042,GO:0051305,GO:0005737,GO:0007046
cluster22717    GO:0005740,GO:0007043,GO:0051306,GO:0005738,GO:0007041,GO:0051304

脚本:

while read line
do
    var1=$(echo $line | awk '{print $1}')                           # assign first field to var1
    Arrayvals=($(echo $line | awk '{print $2}' | sed -e 's/,/ /g')) # create an array from second filed

    for (( i=0; i < ${#Arrayvals[@]} ; i++ ))  # iterate the array using a for loop , ${#Arrayvals[@]} -> gives the length of array
    do
        echo "${var1}    ${Arrayvals[${i}]}"   # echo in desired format
    done

done < sample.txt

输出:

cluster22717   GO:0005737
cluster22717   GO:0007049
cluster22717   GO:0051301
cluster22717   GO:0005738
cluster22717   GO:0007041
cluster22717   GO:0051304
cluster22717   GO:0051307
cluster22717   GO:0005739
cluster22717   GO:0007042
cluster22717   GO:0051305
cluster22717   GO:0005737
cluster22717   GO:0007046
cluster22717   GO:0005740
cluster22717   GO:0007043
cluster22717   GO:0051306
cluster22717   GO:0005738
cluster22717   GO:0007041
cluster22717   GO:0051304

希望这会有所帮助,

答案 1 :(得分:0)

带sed

while read line;do
    left=$(echo $line|grep -oE '^[^ ]+ +')  #the left part + a blank
    echo $line |
        grep -oE '[^ ]+$' |                 #take the right part
        sed -r "s/([^,]+),?/$left\1\n/g" |  #prefix every GO::, with the left part and go back to line
        grep 'c' |                          #remove the empty line added by the very last group
        tee -a output.txt
done<other.txt

输出

cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
cluster22717 GO:0051307
cluster22717 GO:0005739
cluster22717 GO:0007042
cluster22717 GO:0051305
cluster22717 GO:0005737
cluster22717 GO:0007046
cluster22717 GO:0005740
cluster22717 GO:0007043
cluster22717 GO:0051306
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304