我有一个.NET应用程序,它利用.NET Regex功能来匹配EPL标签文本字符串。通常我会使用以下内容: ^ [A-Z0-9,] +“(。+)”$ ,它将匹配每一行(它捕获epl代码之间的文本)。然而,最近EPL发生了变化,并且在每个EPL行的末尾都有换行符 \ x0D \ x0A 。
所以我将代码模式更改为 [((\ r \ n)|(\ x0D \ x0A))A-Z0-9,] +“(。+)” 而现在它只会把放在儿童接触不到的地方并且不承认休息。
如何匹配epl代码之间的文本?
这是我想要匹配的原始EPL
N 0D0A A230,1,0,2,1,1,N,“勿让儿童接触”0D0A A133,26,0,4,1,1,N,“FUROSEMIDE TABLETS 40 MG”0D0A A133,51,0,4,1,1,N,“一个人在早上”0D0A A133,76,0,4,1,1,N “” 0D0A A133,101,0,4,1,1,N “” 0D0A A133,126,0,4,1,1,N “” 0D0A A133,151,0,4,1,1,N “” 0D0A A133,176,0,4,1,1,N,“19/04/13 28 TABLET(S)”0D0A A133,201,0,4,1,1,N,“ELIZABETH M SMITH”0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,“任何医疗中心,蓝色路”0D0A A133,260,0,2,1,1,N,“DN54 5TZ,电话:01424 503901”0D0A P1
答案 0 :(得分:2)
我认为您正在寻找RegexOptions.Multiline选项。如:
Regex myEx = new Regex("^[A-Z0-9,]+\".+?\"$", RegexOptions.Multiline);
实际上,正则表达式应为:
"^[A-Z0-9,]+\".*\"\r?$"
Multiline
查找换行符\n
。但该文件包含\r\n
。所以它找到结束引用,看到$
,并查找换行符。但该文件具有Windows行结尾(\r\n
)。我修改过的正则表达式跳过该字符,如果它在那里。
如果要在结果中删除这些字符,请创建一个捕获组:
"^([A-Z0-9,]+\".*\")\r?$"
或者,您可以通过在每个结果上调用Trim
来过滤它们:
MatchCollection matches = myEx.Matches(text);
foreach (Match m in matches)
{
string s = m.Value.Trim(); // removes trailing \r
}
答案 1 :(得分:0)
谢谢Jim,我尝试了你的建议并且有效......
我使用了以下内容......
Dim sText As String = "N 0D0A A230,1,0,2,1,1,N,"Keep out of the reach of children"0D0A A133,26,0,4,1,1,N," FUROSEMIDE TABLETS 40 MG"0D0A A133,51,0,4,1,1,N," ONE IN THE MORNING"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N,""0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N,"19/04/13 28 TABLET(S)"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"CN54 1TZ,Tel:01424 503901"0D0A P1"
Dim sRet As String = String.Empty
Dim sTemp As String = String.Empty
Dim m As Match
Dim grp As System.Text.RegularExpressions.Group
Dim sPattern As String = "^([A-Z0-9,])+\"".*\""\r?$"
Dim sPatternRegex As New Regex(sPattern, RegexOptions.Multiline)
Dim matches As MatchCollection = sPatternRegex.Matches(sText)
For Each m In matches
' removes trailing \r
'Dim s As String = m.Value.Trim()
sTemp += m.Value.Trim() + vbCrLf
Next
' The previous code detects where the line feeds are, replaces the old one with a standard vbCrLF, then the following code parses it like normal
sPattern = "^[A-Z0-9,]+\""(.+)\""$"
' Standard WinPrint EPL Label: The parsed version would appear as: ^[A-Z0-9,]+\"(.+)\"$
For Each s As String In sTemp.Split(vbCrLf)
m = Regex.Match(s.Trim, sPattern)
grp = m.Groups(1)
sRet += grp.Value + vbCrLf
Next
Return sRet.Trim