从div标签获取信息

时间:2011-12-22 22:18:01

标签: c# regex

我有一个包含这种形式的字符串:

<div class="c1">text1</div></br>
<div class="c2">text2</div></br>
<div class="c3">text3</div></br>

我想创建一个带有c#的NameValueCollection和像这样的正则表达式

 { ("c1","text1"),("c2","text2"),("c3","text3") }.

现在我只能得到像这样的“文字”

 Match match = Regex.Match(inputString, "[^<>]+(?=[<])");

有人可以帮助我同时获得classinnertext吗?

由于

1 个答案:

答案 0 :(得分:2)

我同意敏捷包,但这回答了你的问题。模式评论并将匹配的输出放入字典中以便于提取。 HTH

string data = @"
<div class=""c1"">text1</div></br> 
<div class=""c2"">text2</div></br> 
<div class=""c3"">text3</div></br> 
";

string pattern = @"
(?:class\=\x22)  # Match but don't capture the class= quote
(?<Key>\w+)      # Get the key value
(?:\x22>)        # MBDC the quote and >
(?<Value>[^<]+)  # Extract the text into Value named capture group
";

// Ignore allows us to comment the  pattern; it does not affect regex processing!
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .ToDictionary (mt => mt.Groups["Key"], mt => mt.Groups["Value"] )
     .ToList()
     .ForEach(kvp => Console.WriteLine ("Key {0} Value {1}", kvp.Key, kvp.Value));

/* Output
Key c1 Value text1
Key c2 Value text2
Key c3 Value text3
*/