我有这样的html字符串(yahoo xml description element)
<img src="http://l.yimg.com/a/i/us/we/52/26.gif"/><br />
<b>Current Conditions:</b><br /> Cloudy, 1 C<BR /> <BR />
<b>Forecast:</b><BR /> Mon - Snow. High: -5 Low: -14<br /> Tue - Light Snow. High: -8 Low: -16<br /> <br />
....
我想只获得高值和低值(例如:-5,-14,-8,-16)
我试着像这样使用htmlAgilityPack:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(rssDescriptionElement);
List<string> elements = new List<string>();
foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br"))
{
elements.Add(element.NextSibling.InnerText);
}
上面的htmlString的 elements
列表输出:
"\n"
"\nCloudy, 1 C"
"\n"
"Forecast:"
"\nMon - Snow. High: -5 Low: -14"
"\nTue - Light Snow. High: -8 Low: -16"
"\n"
"\n"
""
"\n(provided by "
"\n"
如何从此列表中获取高值和低值(-5,-14,-8,-16)或另一种不同的解决方案?
答案 0 :(得分:1)
使用正则表达式:
(?:High|Low)\s*:\s*(?<num>-?\d+)
并获取名为num
的组。示例代码:
List<string> elements = new List<string>();
var pattern = @"(?:High|Low)\s*:\s*(?<num>-?\d+)";
foreach (HtmlNode element in htmlDoc.DocumentNode.SelectNodes("//br"))
{
foreach(Match mc in Regex.Matches(element.NextSibling.InnerText, pattern))
{
elements.Add(mc.Groups["num"].ToString());
}
}