我希望HTML-Agility-Pack关闭所有打开的'选项'标签,同时仍然保留innertext。我的目标是捕获以下内容:
我编写的C#代码取决于在innertext之后显示的选项结束标记。
以下是原始HTML:
<select id="Province" >
<option value=""> -- Select province --</option>
<option value="1">Alberta
<option value="2">British Columbia
<option value="3">Manitoba
<option value="4">New Brunswick
<option value="5">Newfoundland
<option value="6">Northwest Territories
<option value="7">Nova Scotia
<option value="8">Nunavut
<option value="9">Ontario
<option value="10">Prince Edward Island
<option value="11">Quebec
<option value="12">Saskatchewan
<option value="13">Yukon
</select>
由HTML-AgilityPack格式化的HTML:
<select id="Province" >
<option value=""> -- Select province --</option>
<option value="1"></option>Alberta
<option value="2"></option>British Columbia
<option value="3"></option>Manitoba
<option value="4"></option>New Brunswick
<option value="5"></option>Newfoundland
<option value="6"></option>Northwest Territories
<option value="7"></option>Nova Scotia
<option value="8"></option>Nunavut
<option value="9"></option>Ontario
<option value="10"></option>Prince Edward Island
<option value="11"></option>Quebec
<option value="12"></option>Saskatchewan
<option value="13"></option>Yukon
</select>
正如您所看到的,不考虑包含innertext。是否可以在innertext之后添加结束标记?
例如:
<option value="1">Alberta</option>
以下是用于解析HTML的C#代码:
static void LoadProvinces()
{
//Read the HTML File and save it to the string 'rawProvinces'
System.IO.StreamReader myFile = new System.IO.StreamReader("ProvincesCheckout.htm");
string rawProvinces = myFile.ReadToEnd();
//This tells HTML-Agility-Pack to close all open Option Tags
HtmlNode.ElementsFlags["option"] = HtmlElementFlag.Closed;
//Load the rawProvinces string into HTML-Agility-Pack
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(rawProvinces);
//Convert the parsed HTML to the string variable 'parsedHtml' and save it to 'hap.htm'
string parsedHtml = htmlDoc.DocumentNode.OuterHtml;
System.IO.StreamWriter file = new System.IO.StreamWriter("hap.htm");
file.WriteLine(parsedHtml);
file.Close();
答案 0 :(得分:0)
由于某种原因,它不起作用,但它应该。虽然您也可以使用String类及其方法自己执行此操作:
// Get all option elements
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//option");
foreach (HtmlNode node in nodes)
{
// Get the outer position of the NextSibling (which would be the text we want to surround with </option>)
int nextPosition = rawProvinces.IndexOf(node.NextSibling.OuterHtml) + node.NextSibling.OuterHtml.Trim().Length;
// Check if there isn't already a </option> element
if (!rawProvinces.Substring(nextPosition, 8).StartsWith("</option"))
{
// Add the element
rawProvinces = rawProvinces.Insert(nextPosition, "</option>");
}
}