我有两列是这样的:
cluster22717 GO:0005737,GO:0007049,GO:0051301
如何将其转换为此:
cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
我还要提到的是,这是文件中的一行,其中包含数千行,而第二列的元素数却不同。 提前致谢, Pezhman Safdari
答案 0 :(得分:1)
最简单的解决方案是使用一些循环,请参见下面的示例
输入文件:sample.txt
cluster22717 GO:0005737,GO:0007049,GO:0051301
cluster22717 GO:0005738,GO:0007041,GO:0051304,GO:0051307
cluster22717 GO:0005739,GO:0007042,GO:0051305,GO:0005737,GO:0007046
cluster22717 GO:0005740,GO:0007043,GO:0051306,GO:0005738,GO:0007041,GO:0051304
脚本:
while read line
do
var1=$(echo $line | awk '{print $1}') # assign first field to var1
Arrayvals=($(echo $line | awk '{print $2}' | sed -e 's/,/ /g')) # create an array from second filed
for (( i=0; i < ${#Arrayvals[@]} ; i++ )) # iterate the array using a for loop , ${#Arrayvals[@]} -> gives the length of array
do
echo "${var1} ${Arrayvals[${i}]}" # echo in desired format
done
done < sample.txt
输出:
cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
cluster22717 GO:0051307
cluster22717 GO:0005739
cluster22717 GO:0007042
cluster22717 GO:0051305
cluster22717 GO:0005737
cluster22717 GO:0007046
cluster22717 GO:0005740
cluster22717 GO:0007043
cluster22717 GO:0051306
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
希望这会有所帮助,
答案 1 :(得分:0)
带sed
while read line;do
left=$(echo $line|grep -oE '^[^ ]+ +') #the left part + a blank
echo $line |
grep -oE '[^ ]+$' | #take the right part
sed -r "s/([^,]+),?/$left\1\n/g" | #prefix every GO::, with the left part and go back to line
grep 'c' | #remove the empty line added by the very last group
tee -a output.txt
done<other.txt
输出
cluster22717 GO:0005737
cluster22717 GO:0007049
cluster22717 GO:0051301
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304
cluster22717 GO:0051307
cluster22717 GO:0005739
cluster22717 GO:0007042
cluster22717 GO:0051305
cluster22717 GO:0005737
cluster22717 GO:0007046
cluster22717 GO:0005740
cluster22717 GO:0007043
cluster22717 GO:0051306
cluster22717 GO:0005738
cluster22717 GO:0007041
cluster22717 GO:0051304