如何使用Unix命令连接第一次出现的标题下的多行?

时间:2014-06-24 15:50:33

标签: regex unix

我有一个这样的文件:

Query=scaffold1_size75580
lcl|Os10t0535800-01
Query=scaffold1_size75580
lcl|Os10t0536000-02
Query=scaffold1_size75580
lcl|Os10t0536100-01
Query=scaffold1_size75580
lcl|Os10t0536400-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536900-00
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold1_size75580
lcl|Os10t0536700-01
Query=scaffold2_size74975
lcl|Os11t0637501-00
Query=scaffold2_size74975
lcl|Os11t0637600-00
Query=scaffold2_size74975
lcl|Os11t0637800-01
Query=scaffold2_size74975
lcl|Os11t0637800-01
Query=scaffold2_size74975
lcl|Os11t0638200-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638700-00
Query=scaffold2_size74975
lcl|Os11t0638900-01
Query=scaffold2_size74975
lcl|Os11t0638900-01
Query=scaffold3_size69500
lcl|Os06t0725100-01
Query=scaffold3_size69500
lcl|Os06t0724900-01
Query=scaffold3_size69500
lcl|Os06t0724900-01
Query=scaffold3_size69500
lcl|Os06t0724700-01
Query=scaffold3_size69500
lcl|Os06t0724700-01
Query=scaffold3_size69500
lcl|Os06t0724600-01
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold3_size69500
lcl|Os06t0724100-02
Query=scaffold4_size68019
lcl|Os01t0627550-00
Query=scaffold4_size68019
lcl|Os01t0626900-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626400-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626100-01
Query=scaffold4_size68019
lcl|Os01t0626032-01
Query=scaffold5_size66739
lcl|Os04t0653200-01
Query=scaffold5_size66739
lcl|Os04t0653400-01
Query=scaffold5_size66739
lcl|Os04t0653400-01
Query=scaffold5_size66739
lcl|Os04t0653600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold5_size66739
lcl|Os04t0654600-01
Query=scaffold6_size65486
lcl|Os01t0259900-00
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259400-01
Query=scaffold6_size65486
lcl|Os01t0259200-01
Query=scaffold7_size64123
lcl|Os04t0162100-01
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0325000-00
Query=scaffold7_size64123
lcl|Os05t0324600-01
Query=scaffold7_size64123
lcl|Os05t0324600-01

等等,直到支架在66000左右。我希望我的文件有重复的标题被删除,所有相应的条目都在一个标题内,即我想这样:

Query=scaffold1_75580
lcl|Os10t0535800-01
lcl|Os10t0536000-02
lcl|Os10t0536100-01
lcl|Os10t0536400-01
lcl|Os10t0536700-01
lcl|Os10t0536700-01
lcl|Os10t0536900-00
lcl|Os10t0536700-01
lcl|Os10t0536700-01
Query=scaffold2_size74975
lcl|Os11t0637501-00
lcl|Os11t0637600-00
lcl|Os11t0637800-01
lcl|Os11t0637800-01
lcl|Os11t0638200-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638700-00
lcl|Os11t0638900-01
lcl|Os11t0638900-01
Query=scaffold3_size69500
lcl|Os06t0725100-01
lcl|Os06t0724900-01
lcl|Os06t0724900-01
lcl|Os06t0724700-01
lcl|Os06t0724700-01
lcl|Os06t0724600-01
lcl|Os06t0724100-02
lcl|Os06t0724100-02
lcl|Os06t0724100-02
lcl|Os06t0724100-02

等等。 这该怎么做?

1 个答案:

答案 0 :(得分:1)

如果你不介意通过多次通过,我可能会建议:

搜索:

^(Query=.*\n)((?:(?!Query=).*\n)+)\1

替换:

\1\2

Live demo