我有一个.csv文件,其中包含三个我希望进一步分离的元素。文件中的行如下所示:
gene_id "ENSDARG00000104632", gene_version "2", gene_name "RERG"
gene_id "ENSDARG00000104632", gene_version "2", transcript_id "ENSDART00000166186"
gene_id "ENSDARG00000104632", gene_version "2", transcript_id "ENSDART00000166186"
我想把字符串放在“”中,并将它们分成由
分隔的各自元素基本上我希望它看起来像这样:
gene_id, ENSDARG00000104632, gene_version, 2, gene_name, RERG
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
我原本想过这样做:
awk 'BEGIN{OFS=",";FS="""};{print $1,$2,$3,$4,$5,$6}'
然而,似乎AWK无法识别“作为分隔符。是否有人建议如何实现这一目标?
答案 0 :(得分:2)
$ awk -F'[ ",]+' -v OFS=', ' '{sub(/"$/,""); $1=$1} 1' file
gene_id, ENSDARG00000104632, gene_version, 2, gene_name, RERG
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186