我有一串HTML需要抓取“[Title | http://www.test.com]”模式,例如
“dafasdfasdf,adfasd。[测试| http://www.test.com/] adf ddasfasdf [SDAF | http://www.madee.com/] assg ad”
我需要将“[Title | http://www.test.com]”替换为“http://www.test.com/'>Title”。
最好的办法是什么?
我接近了:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string p18 = @"(\[.*?|.*?\])";
MatchCollection mc18 = Regex.Matches(test, p18, RegexOptions.Singleline | RegexOptions.IgnoreCase);
foreach (Match m in mc18)
{
string value = m.Groups[1].Value;
string fulltag = value.Substring(value.IndexOf("["), value.Length - value.IndexOf("["));
Console.WriteLine("text=" + fulltag);
}
必须有一种更清晰的方法来获取两个值,例如“标题”位和网址本身。
有什么建议吗?
答案 0 :(得分:2)
替换模式:
\[([^|]+)\|[^]]*]
使用:
$1
一个简短的解释:
\[ # match the character '['
( # start capture group 1
[^|]+ # match any character except '|' and repeat it one or more times
) # end capture group 1
\| # match the character '|'
[^]]* # match any character except ']' and repeat it zero or more times
] # match the character ']'
C#演示看起来像:
string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad ";
string adjusted = Regex.Replace(test, @"\[([^|]+)\|[^]]*]", "$1");