我有一个HTML格式的字符串
<div class="ExternalClass6FC23FEAF7454B3A8006CF7E1D2257B8">
<audio src="/sites/audioblogs/Group2Doc/0.021950338035821915.wav" controls="controls"></audio><br/><img src="/sites/audioblogs/Group2Doc/20140103_152938.jpg" alt=""/></div>
我只需要source(src)属性, 我正在尝试使用Regex.Match,
还有其他选择吗?
谢谢, 萨钦
答案 0 :(得分:2)
我使用HtmlAgilityPack
来解析HTML,而不是正则表达式:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // html is your string
var audio = doc.DocumentNode.Descendants("audio")
.FirstOrDefault(n => n.Attributes["src"] != null);
string src = null;
if (audio != null)
src = audio.Attributes["src"].Value;
结果:/sites/audioblogs/Group2Doc/0.021950338035821915.wav
答案 1 :(得分:0)
string yourFullHtmlstring = ".....";
//will make sure all of your double quotes are single quotes
yourFullHtmlstring= yourFullHtmlstring.Replace("\"", "'");
//will turn it into array
string[] arr = yourFullHtmlstring.Split( new string[] {"src='"}, StringSplitOptions.None);
//this will trim the sources found only to the source value.
//start from 1 because we skip the first entry before the first src
for (int i = 1; i < arr.Length; i++)
{
arr[i] = arr[i].Substring(0, arr[i].IndexOf("'"));
}