以下是HTML的摘要:
<td style="border-width : 0px; padding: 5px;">
<p class="rvps2">
<img alt="Main Application Window" style="float: left; padding : 1px; margin : 5px 5px;" src="lib/DLG_MSA.png">
<span class="rvts15">When you start the </span>
<span class="rvts17">Meeting Schedule Assistant</span>
<span class="rvts15"> program, this dialogue is displayed. Use the menu options to setup program options and create reports. Creating and modifying schedules is a really simple process. Work through the </span>
<span class="rvts16">Options</span>
<span class="rvts15"> menu first in the order described below and you should not have any problems. At the bottom of the dialogue you can see what the active schedule type is. This can be changed by using the </span>
<a class="rvts27" href="msa-options-settings.html#SCHEDULE_TYPES">Options</a>
<span class="rvts15"> dialogue. </span>
<span class="rvts17">The active schedule type only applies to regular reports and </span>
<span class="rvts26">not</span>
<span class="rvts17"> Christian Life and Ministry meeting schedules.</span>
</p>
<p class="rvps5"><span class="rvts15">Please see the </span>
<a class="rvts20" href="contact-form.html">Contacting Me</a>
<span class="rvts15"> help page if you want to contact me about anything concerning this program. Please see the </span>
<a class="rvts20" href="msa-revision-history.html">Revision History</a>
<span class="rvts15"> to see what the latest features and changes are.</span>
</p>
</td>
它看起来像这样:
我已经编写了一个C#控制台应用程序,以读取html文件并用img
标签包装任何a
标签,但这不是正确的:
我最终得到了:
<td style="border-width : 0px; padding: 5px;">
<p class="rvps2">
<a href="lib/DLG_MSA.png">
<img src="lib/DLG_MSA.png" data-lightbox="DLG_MSA" data-title="Main Application Window" data-alt="Main Application Window" alt="Main Application Window" style="float: left; padding : 1px; margin : 5px 5px;">
</a>
</p>
<p class="rvps5">
<span class="rvts15">Please see the </span>
<a class="rvts20" href="contact-form.html">Contacting Me</a>
<span class="rvts15"> help page if you want to contact me about anything concerning this program. Please see the </span>
<a class="rvts20" href="msa-revision-history.html">Revision History</a>
<span class="rvts15"> to see what the latest features and changes are.</span>
</p>
</td>
它丢失了第一个span
标记内的所有p
对象。这是我的控制台应用程序中使用 HTML Agility Pack 的代码:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using HtmlAgilityPack;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string[] fileEntries = Directory.GetFiles(@"D:\My Programs\2017\MeetSchedAssist\HelpNDoc\HTML", "*.html");
int i = 0, iCount = fileEntries.Count();
foreach (string fileName in fileEntries)
{
i++;
Console.WriteLine($"Processing file {i} of {iCount}: {fileName}");
WrapImages(fileName);
}
}
static void WrapImages(string strHtmlFile)
{
var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(strHtmlFile);
// traverse html to find and replace image node
var q = new Queue<HtmlNode>();
q.Enqueue(doc.DocumentNode);
while (q.Count > 0)
{
var item = q.Dequeue();
HtmlNode node = null;
// found image, need to wrap it!
if (item.Name == "img")
{
string strNode = "";
string imageUrl = "";
string imageAlt = "";
string imageStyle = "";
string imageClass = "";
string imageMap = "";
// Test for having width or height
string imageW = "";
string imageH = "";
try
{
imageW = item.Attributes["width"].Value;
imageH = item.Attributes["height"].Value;
Console.WriteLine($"DIMS — {Path.GetFileName(strHtmlFile)} — {item.OuterHtml}");
}
catch { }
if(item.ParentNode != null)
{
try
{
imageUrl = item.Attributes["src"].Value;
imageAlt = item.Attributes["alt"].Value;
imageStyle = item.Attributes["style"].Value;
imageClass = item.Attributes["class"].Value;
imageMap = item.Attributes["usemap"].Value;
}
catch
{
if (imageUrl == "" || imageAlt == "")
{
Console.WriteLine($"ALT — {Path.GetFileName(strHtmlFile)} — {item.OuterHtml}");
}
}
// create new node
strNode = $"<a href=\"{imageUrl}\"><img src=\"{imageUrl}\"";
strNode += $"data-lightbox=\"{Path.GetFileNameWithoutExtension(imageUrl)}\"";
strNode += $"data-title=\"{imageAlt}\"";
strNode += $"data-alt=\"{imageAlt}\"";
if (imageAlt != "")
strNode += $"alt=\"{imageAlt}\"";
if (imageStyle != "")
strNode += $"style=\"{imageStyle}\"";
if (imageClass != "")
strNode += $"class=\"{imageClass}\"";
if (imageMap != "")
strNode += $"usemap=\"{imageMap}\"";
strNode += "></a>";
node = HtmlNode.CreateNode(strNode);
// set new html
item.ParentNode.InnerHtml = node.OuterHtml;
}
else
{
Console.WriteLine($"NULL — {strHtmlFile} — {item.OuterHtml}");
}
}
else
{
// traverse children
foreach (var child in item.ChildNodes) { q.Enqueue(child); }
}
}
doc.Save(@"d:\TestHtmlOutput\" + Path.GetFileName(strHtmlFile));
}
}
}
问题出在这行代码上:
item.ParentNode.InnerHtml = node.OuterHtml;
以上面的示例为例,item.ParentNode.InnerHtml
的{{1}}返回:
img
它被替换为新的锚点。我要结束的是:
<img alt="Main Application Window" style="float: left; padding : 1px; margin : 5px 5px;" src="lib/DLG_MSA.png">
<span class="rvts15">When you start the </span>
<span class="rvts17">Meeting Schedule Assistant</span>
<span class="rvts15"> program, this dialogue is displayed. Use the menu options to setup program options and create reports. Creating and modifying schedules is a really simple process. Work through the </span>
<span class="rvts16">Options</span>
<span class="rvts15"> menu first in the order described below and you should not have any problems. At the bottom of the dialogue you can see what the active schedule type is. This can be changed by using the </span>
<a class="rvts27" href="msa-options-settings.html#SCHEDULE_TYPES">Options</a>
<span class="rvts15"> dialogue. </span>
<span class="rvts17">The active schedule type only applies to regular reports and </span>
<span class="rvts26">not</span>
<span class="rvts17"> Christian Life and Ministry meeting schedules.</span>
如何解决此问题?我只想用<td style="border-width : 0px; padding: 5px;">
<p class="rvps2">
<a href="lib/DLG_MSA.png">
<img src="lib/DLG_MSA.png" data-lightbox="DLG_MSA" data-title="Main Application Window" data-alt="Main Application Window" alt="Main Application Window" style="float: left; padding : 1px; margin : 5px 5px;">
</a>
<span class="rvts15">When you start the </span>
<span class="rvts17">Meeting Schedule Assistant</span>
<span class="rvts15"> program, this dialogue is displayed. Use the menu options to setup program options and create reports. Creating and modifying schedules is a really simple process. Work through the </span>
<span class="rvts16">Options</span>
<span class="rvts15"> menu first in the order described below and you should not have any problems. At the bottom of the dialogue you can see what the active schedule type is. This can be changed by using the </span>
<a class="rvts27" href="msa-options-settings.html#SCHEDULE_TYPES">Options</a>
<span class="rvts15"> dialogue. </span>
<span class="rvts17">The active schedule type only applies to regular reports and </span>
<span class="rvts26">not</span>
<span class="rvts17"> Christian Life and Ministry meeting schedules.</span>
</p>
<p class="rvps5"><span class="rvts15">Please see the </span>
<a class="rvts20" href="contact-form.html">Contacting Me</a>
<span class="rvts15"> help page if you want to contact me about anything concerning this program. Please see the </span>
<a class="rvts20" href="msa-revision-history.html">Revision History</a>
<span class="rvts15"> to see what the latest features and changes are.</span>
</p>
</td>
<img alt="Notice Board" style="vertical-align: text-bottom; padding : 1px; margin : 0px 5px;" src="lib/DLG_SRREditor_NoticeBoard.gif">
<span class="rvts15">This option is only enabled when using one of the advanced modes. Set this option if you want to include a schedule with no highlighting for the notice board.</span>
标签包裹img
标签,而不要触摸其余的HTML内容。