多行正则表达式替换

时间:2011-09-28 19:44:16

标签: regex replace multiline

好的,有很多正则表达式,但一如既往,它们似乎都不符合我的目的。

我有一个文本文件:

F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

并且,使用多行正则表达式(.NET风格),我想做一个替换,以便得到:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

因此,基本上,我抓住以[HD]0501开头的所有内容。

我知道这似乎更适合替换的匹配,但我正在通过一个预先构建的引擎,只接受正则表达式模式字符串和正则表达式替换字符串。

我可以为模式和替换字符串提供什么来获得我想要的结果? Multiline Regex是一种硬编码配置吗?

我原本以为这样的事情会起作用:

搜索: (?<Match>^[HD]0501\d+$),但这没有任何匹配。

搜索: (?!^[HD]0501\d+$),但这匹配了一堆空字符串,我无法弄清楚要替换字符串的内容。

搜索: (?!(?<Omit>^[HD]0501\d+$)),“未找到群组'省略'。”

看起来这应该很简单,但一如既往,Regex设法让我感到愚蠢。非常感谢帮助。

1 个答案:

答案 0 :(得分:3)

尝试匹配以下模式:

(?m)^(?![HD]0501).+(\r?\n)?

并用空字符串替换它。

以下演示:

using System;
using System.Text.RegularExpressions;

namespace Test
{
  class MainClass
  {  
    public static void Main (string[] args)
    {
      string input = @"F00220034277909272011                                  
H001500020003000009272011                              
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011                              
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000";

      string regex = @"(?m)^(?![HD]0501).+(\r?\n)?";

      Console.WriteLine(Regex.Replace(input, regex, ""));
    }
  }
}

打印:

H050100180030263709272011                              
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000

快速解释:

  • (?m)
    • 启用多行模式,以便^匹配新行的开头;
  • ^
    • 匹配新行的开头;
  • (?![HD]0501)
    • 展望未来是否没有 "H0501""D0501";
  • .+
    • 匹配除了换行符之外的一个或多个字符;
  • (\r?\n)?
    • 匹配可选的换行符。