我有一个csv
文件,我想在\n
之后用GCA_*
替换逗号。
输入:
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1,ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio,ASM330895v1,Escherichia coli (E. coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
所需的输出:
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1
ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio
ASM330895v1,Escherichia coli (E. coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
我的尝试
sed 's/ASM*/\n&/' ordered_lines_per_genome.csv > assembly_report_table.csv
答案 0 :(得分:2)
使用GNU sed:
sed 's/\(GCA_[^,]*\),/\1\n/g' input.csv
\(GCA_[^,]*\),
:匹配GCA*
,后跟逗号。 \(...\)
定义了一个组,以后可以在替换字符串中使用。\1\n
:从匹配项中插入组(“ GCA *”)并添加换行符。要直接更改文件,请执行以下操作:
sed -i 's/\(GCA_[^,]*\),/\1\n/g' input.csv
或者通过注释修复命令行:
sed 's/ASM[^,]*/\n&/g' input.csv
或更佳:为了防止尾随逗号:
sed 's/,\(ASM[^,]*\)/\n\1/g' input.csv
答案 1 :(得分:2)
您可能正在寻找这个简单的GNU sed
:
$ sed 's/,/\n/16;P;D' file
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1
ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio
ASM330895v1,Escherichia coli (E.coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
s/,/\n/16
:用换行符,
替换第16个逗号\n
P
:将行打印到第一个换行符\n
D
:删除打印的文本,并使用剩余的文本再次开始循环答案 2 :(得分:2)
您应该删除*
并为全局添加g
sed 's/ASM/\n&/g' ordered_lines_per_genome.csv > assembly_report_table.csv
当您不想使用逗号时,可以使用
sed 's/,ASM/\nASM/g' ordered_lines_per_genome.csv > assembly_report_table.csv
为了娱乐,请使用awk:
awk 'BEGIN {RS="ASM"} NF {print "ASM" $0}' ordered_lines_per_genome.csv
如果您不想在行尾使用逗号,则可以使用
awk 'BEGIN {RS="[,]*ASM"} NF {print "ASM" $0}' ordered_lines_per_genome.csv
答案 3 :(得分:0)
awk解决方案:
$ awk -F, '{i=0;while((++i)<=NF)printf $i ((!(i%16) || i==NF)? ORS : ",")}' mb.csv
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1
ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio
ASM330895v1,Escherichia coli (E. coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
类似于mickp's answer,一行16个字段。
如果确定输入文件只有一行,则可以删除前一个i=0;
。
如果“ ASM”相对唯一,则可以使用自己的方式(以ASM作为行开头):
awk '{print gensub(",ASM","\nASM","g")}' mb.csv
也就是说:
awk '{print gensub(",ASM","\nASM","g")}' ordered_lines_per_genome.csv > assembly_report_table.csv
为您
答案 4 :(得分:0)
使用Perl并假设id以ASM开头。
$ cat maryem.txt
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1,ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio,ASM330895v1,Escherichia coli (E. coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
$ perl -pe ' s/([^^]ASM.+?,)/\n$1/g; s/^,//mg; ' maryem.txt
ASM190063v1,Escherichia coli(E.coli),strain=D3,562,SAMN03252421,PRJNA269191,Nanjing Agricultural University,2016-12-12,n/a,major,Complete Genome,full,Newbler v. 2.7,30-80x,Illumina Miseq; Roche 454 GS Junior,GCA_001900635.1
ASM301855v1,Escherichia coli (E. coli),strain=2013C-4225,562,SAMN08579596,PRJNA218110,CDC,2018-3-26,n/a,major,Complete Genome,full,HGAP v. 3,yes,76.725x,PacBio
ASM330895v1,Escherichia coli (E. coli),strain=2017C-4109,562,SAMN09534373,PRJNA218110,CDC,2018-7-10,n/a,major,Complete Genome,full,HGAP v. 3,yes,286.7X,PacBio
$