在下面的例子中,我想将这些行分成两列,其中第1列是字母串,第2列是" - "之后的数字。标志。
>1-1112309
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT
>2-787704
TGAGGTAGTAGGTTGTATAGTT
>3-736193
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC
>4-671373
TGTAAACATCCTCGACTGGAAGCT
期望的输出:
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT 1112309
TGAGGTAGTAGGTTGTATAGTT 787704
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC 736193
TGTAAACATCCTCGACTGGAAGCT 671373
答案 0 :(得分:1)
awk -F- '/^>/ {n = $2; next} {printf "%-40s %d\n", $0, n}' file
说明:
-F- # set field separator to a dash
/^>/ # if line begins with a >
{n = $2; next} # then save second field and go on to next line in file
# empty pattern matches every line (that makes it here)
{printf "%-40s %d\n", $0, n} # print current line in 40 columns left-justified
# then print saved number and a newline
答案 1 :(得分:1)
另一个awk
命令,
$ awk -v RS="\n>" '{gsub (/\n/," "); gsub (/^.*-/,"",$1); printf "%-40s %d\n", $2,$1}' file
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCT 1112309
TGAGGTAGTAGGTTGTATAGTT 787704
GTTTCCGTAGTGTAGTGGTTATCACGTTCGCC 736193
TGTAAACATCCTCGACTGGAAGCT 671373
RS设置为\n>
。因此它根据RS变量中的值(\n>
)将输入文件拆分为记录。
gsub (/\n/," ") # Replaces all the newlines in each record with a space.
gsub (/^.*-/,"",$1) # Removes all the characters upto - in the column1.
printf "%-40s %d\n", $2,$1 # Prints column2, column1 in a formatted way.