带有特殊字符的多行正则表达式

时间:2012-05-09 15:22:06

标签: regex

我希望有人可以帮助我使用这个正则表达式。我只用它来收集字符串中的单个单词,所以我不知道如何处理多行和看起来像ASCII字符。

这是文本块:

Information       - RETAILEAITRT00003 - Traitement        - Processing        - --->  Recovery from 05/09/2012 at 09:17:50 AM 

Information       - RETAILEAITRT00020 - Traitement        - Processing        - --->  Information recovery starts on 05/09/2012 at 09:17:50 AM 

Information       - RETAILEAITRT00021 - Traitement        - Processing        - ---->  File processing: C:\Program Files (x86)\Prog\Prog RIT\Web Orders\live\Prog Import\Order_110039354.tab
Information       - RETAILEAITRT00005 - Traitement        - Processing        - --->  End of information recovery on 05/09/2012 at 09:17:51 AM 
Information       - RETAILEAITRT00006 - Traitement        - Processing        -    -> 6 records read 
Information       - RETAILEAITRT00008 - Traitement        - Processing        -    -> 6 records processed 
Information       - RETAILEAITRT00010 - Traitement        - Processing        -    -> 6 integrated records 
Information       - RETAILEAITRT00015 - Traitement        - Processing        -    ->  No integration errors 

Information       - RETAILEAITRT00020 - Traitement        - Processing        - --->  Information recovery starts on 05/09/2012 at 09:17:51 AM 

Information       - RETAILEAITRT00021 - Traitement        - Processing        - ---->  File processing: C:\Program Files (x86)\Prog\Prog RIT\Web Orders\live\Prog Import\Order_110039355.tab
Third-party       -  : La raison sociale doit �tre renseign�e 
Third-party       - _SHIP : La raison sociale doit �tre renseign�e 
Erreur            - RETAILEAIDOC00008 - Document          - Document          - address The internal reference enables the recovery of a document. It is mandatory 
Erreur            - RETAILEAIDOC00008 - Document          - Document          - address The internal reference enables the recovery of a document. It is mandatory 
Information       - RETAILEAITRT00005 - Traitement        - Processing        - --->  End of information recovery on 05/09/2012 at 09:17:52 AM 
Information       - RETAILEAITRT00006 - Traitement        - Processing        -    -> 4 records read 
Information       - RETAILEAITRT00008 - Traitement        - Processing        -    -> 4 records processed 
Information       - RETAILEAITRT00012 - Traitement        - Processing        -    ->  No records integrated 
Information       - RETAILEAITRT00013 - Traitement        - Processing        -    -> 4 records contain errors 

Information       - RETAILEAITRT00003 - Traitement        - Processing        - --->  Recovery from 05/09/2012 at 09:33:03 AM 

Information       - RETAILEAITRT00020 - Traitement        - Processing        - --->  Information recovery starts on 05/09/2012 at 09:33:03 AM 

Information       - RETAILEAITRT00021 - Traitement        - Processing        - ---->  File processing: C:\Program Files (x86)\Prog\Prog RIT\Web Orders\live\Prog Import\Order_110039356.tab
Information       - RETAILEAITRT00005 - Traitement        - Processing        - --->  End of information recovery on 05/09/2012 at 09:33:05 AM 
Information       - RETAILEAITRT00006 - Traitement        - Processing        -    -> 6 records read 
Information       - RETAILEAITRT00008 - Traitement        - Processing        -    -> 6 records processed 
Information       - RETAILEAITRT00010 - Traitement        - Processing        -    -> 6 integrated records 
Information       - RETAILEAITRT00015 - Traitement        - Processing        -    ->  No integration errors 

Information       - RETAILEAITRT00020 - Traitement        - Processing        - --->  Information recovery starts on 05/09/2012 at 09:33:05 AM 

Information       - RETAILEAITRT00021 - Traitement        - Processing        - ---->  File processing: C:\Program Files (x86)\Prog\Prog RIT\Web Orders\live\Prog Import\Order_110039357.tab
Information       - RETAILEAITRT00005 - Traitement        - Processing        - --->  End of information recovery on 05/09/2012 at 09:33:06 AM 
Information       - RETAILEAITRT00006 - Traitement        - Processing        -    -> 6 records read 
Information       - RETAILEAITRT00008 - Traitement        - Processing        -    -> 6 records processed 
Information       - RETAILEAITRT00010 - Traitement        - Processing        -    -> 6 integrated records 
Information       - RETAILEAITRT00015 - Traitement        - Processing        -    ->  No integration errors

但是,我只想要这个片段:

Information       - RETAILEAITRT00020 - Traitement        - Processing        - --->  Information recovery starts on 05/09/2012 at 09:17:51 AM 

Information       - RETAILEAITRT00021 - Traitement        - Processing        - ---->  File processing: C:\Program Files (x86)\Prog\Prog RIT\Web Orders\live\Prog Import\Order_110039355.tab
Third-party       -  : La raison sociale doit �tre renseign�e 
Third-party       - _SHIP : La raison sociale doit �tre renseign�e 
Erreur            - RETAILEAIDOC00008 - Document          - Document          - address The internal reference enables the recovery of a document. It is mandatory 
Erreur            - RETAILEAIDOC00008 - Document          - Document          - address The internal reference enables the recovery of a document. It is mandatory 
Information       - RETAILEAITRT00005 - Traitement        - Processing        - --->  End of information recovery on 05/09/2012 at 09:17:52 AM 
Information       - RETAILEAITRT00006 - Traitement        - Processing        -    -> 4 records read 
Information       - RETAILEAITRT00008 - Traitement        - Processing        -    -> 4 records processed 
Information       - RETAILEAITRT00012 - Traitement        - Processing        -    ->  No records integrated 
Information       - RETAILEAITRT00013 - Traitement        - Processing        -    -> 4 records contain errors 

还有一些特殊字符出现了奇怪的问号。我真的不知道从哪里开始它真的..我想它将不得不寻找^ Erreur,然后抓住它上面和下面的线,直到它找到^带有空格......?

由于

1 个答案:

答案 0 :(得分:0)

我能够使用以下正则表达式来处理它:

Information.+recovery starts.+\n\n(?:.+\n)+(?:Erreur.+\n)+(?:.+\n)+

注意:这需要使用“g”(全局)标志(在JavaScript中成功测试)。不确定你使用的是哪种语言,但它应该有一个等效的标志。

是的,这非常难看:)。这基本上是它正在寻找的东西:

  • 以“信息”开头并包含“恢复开始”字样的行
  • 后面跟一个空行
  • 后面至少有一个通用行
  • 后面至少有一行以“Erreur”开头
  • 后跟任何连续的非空白行