我有一个这样的文件:
scaffold1_size143306
Os03t0746800-01
scaffold1_size143306
Os03t0746800-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0746500-01
scaffold1_size143306
Os03t0123800-01
scaffold1_size143306
Os03t0123800-01
scaffold1_size143306
Os02t0708500-01
scaffold1_size143306
Os02t0708500-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0708200-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-01
scaffold1_size143306
Os02t0707900-03
scaffold2_size121414
Os06t0136900-01
scaffold2_size121414
Os06t0136900-01
scaffold2_size121414
Os06t0136700-01
scaffold2_size121414
Os06t0136600-01
scaffold2_size121414
等等,直到某些55900的脚手架。我希望我的文件删除重复的标题,所有相应的条目都在一个标题内,即我想这样:
scaffold1_size143306
Os03t0746800-01
Os03t0746800-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0746500-01
Os03t0123800-01
Os03t0123800-01
Os02t0708500-01
Os02t0708500-01
Os02t0708200-01
Os02t0708200-01
Os02t0708200-01
Os02t0708200-01
Os02t0708200-01
Os02t0708200-01
Os02t0707900-01
Os02t0707900-01
Os02t0707900-01
Os02t0707900-01
Os02t0707900-01
Os02t0707900-01
Os02t0707900-03
scaffold2_size121414
Os06t0136900-01
Os06t0136900-01
Os06t0136700-01
Os06t0136600-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0135900-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
Os06t0134300-01
等每个脚手架。上面的que在它前面写了查询,所以我无法使用该表达式。怎么做?
答案 0 :(得分:0)
一些伪代码可以帮助您入门:
while there are more lines is the file
Read next line
if that line starts with "scaffold_size" (header line)
if it's different than the last "scaffold_size" (or the first read in)
currentScaffoldSizeLine = the just read in line
output it
else
discard the line, because you don't want duplicate headers
else
print the body line