使用正则表达式解析XML并获取标记之间的值

时间:2011-06-03 23:11:22

标签: c# regex vb.net

我有一个正则表达式,我用它来获取两组id之间的数据 <CLASSCOD>70</CLASSCOD>我使用的正则表达式(?<=<CLASSCOD>)(?:[^<]|<(?!/CLASSCOD))*在大多数情况下有效,但是当我有一个像<CLASSCOD>N</CLASSCOD>这样的单个值时,它表示没有匹配。

整个数据字符串如下所示

<DATE>0601</DATE>
<YEAR>11</YEAR>
<AGENCY>Department of the Interior</AGENCY>
<OFFICE>Bureau of Indian Affairs</OFFICE>
<LOCATION>BIA - DAPM</LOCATION>
<ZIP>85004</ZIP>
<CLASSCOD>N</CLASSCOD>
<OFFADD>Contracting Office - Western Region 2600 N. Central Avenue, 4th Floor Phoenix AZ 85004</OFFADD>
<SUBJECT>Boiler Replacement</SUBJECT>
<SOLNBR>A11PS00463</SOLNBR>
<RESPDATE>061711</RESPDATE>
<ARCHDATE>05312012</ARCHDATE>
<CONTACT>Geraldine M. Williams Purchasing Agent 6023794087 geraldine.williams@bia.gov;<a href="mailto:EC_helpdesk@NBC.GOV">Point of Contact above, or if none listed, contact the IDEAS EC HELP DESK for assistance</a>
</CONTACT>
<LINK><URL>https://www.fbo.gov/spg/DOI/BIA/RestonVA/A11PS00463/listing.html<LINKDESC>Link To Document</LINK>
<EMAIL></EMAIL>
<EMAIL>
  EC_helpdesk@NBC.GOV
  <EMAILDESC>
    Point of Contact above, or if none listed, contact the IDEAS EC HELP DESK for assistance
  </EMAILDESC>
</EMAIL>
<SETASIDE>Total Small Business</SETASIDE>
<POPCOUNTRY>USA</POPCOUNTRY>
<POPZIP>85634</POPZIP>
<POPADDRESS>BIE Tohono O'odham High School, Sells, AZ</POPADDRESS>

有关原因的任何建议吗?

由于

2 个答案:

答案 0 :(得分:2)

更简单的事情应该有效:

<CLASSCOD>(.+?)</CLASSCOD>

示例:

Match match = Regex.Match(input, @"<CLASSCOD>(.+?)</CLASSCOD>");
if (match.Success) {
    string value = match.Groups[1].Value;
    Console.WriteLine(value);
}

答案 1 :(得分:1)

如果您想提取括号内的值,可以使用以下RegEx:

<([^>]+)>([^<]*)</\1>

对于这种情况,不需要使用前瞻和后视操作符。