WebBrowser Control
似乎在设置webBrowser1.DocumentText时重新排列HTML标记中的属性..
我想知道我是否缺少某种渲染模式或文档编码。只需将RichTextBoxControl
(txt_htmlBody)和WebBrowser控件(webBrowser1)添加到Windows窗体中即可看出我的问题。
添加webBrowser1 WebBrowser控件,并添加一个事件处理程序; webBrowser1_DocumentCompleted
我用它将鼠标点击事件添加到网络浏览器控件。
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Attach an event to handle mouse clicks on the web browser
this.webBrowser1.Document.Body.MouseDown += new HtmlElementEventHandler(Body_MouseDown);
}
在鼠标点击事件中,我们会像这样点击哪个元素;
private void Body_MouseDown(Object sender, HtmlElementEventArgs e)
{
// Get the clicked HTML element
HtmlElement elem = webBrowser1.Document.GetElementFromPoint(e.ClientMousePosition);
if (elem != null)
{
highLightElement(elem);
}
}
private void highLightElement(HtmlElement elem)
{
int len = this.txt_htmlBody.TextLength;
int index = 0;
string textToSearch = this.txt_htmlBody.Text.ToLower(); // convert everything in the text box to lower so we know we dont have a case sensitive issues
string textToFind = elem.OuterHtml.ToLower();
int lastIndex = textToSearch.LastIndexOf(textToFind);
// We cant find the text, because webbrowser control has re-arranged attributes in the <img> tag
// Whats rendered by web browser: "<img border=0 alt=\"\" src=\"images/promo-green2_01_04.jpg\" width=393 height=30>"
// What was passed to web browser from textbox: <img src="images/PROMO-GREEN2_01_04.jpg" width="393" height="30" border="0" alt=""/>
// As you can see, I will never be able to find my data in the source because the webBrowser has changed it
}
将txt_htmlBody
RichTextBox
添加到表单,并设置RichTextBox
事件的TextChanged,以便将WebBrowser1.DocumentText
(txt_htmlBody)文本更改为RichTextBox
private void txt_htmlBody_TextChanged(object sender, EventArgs e)
{
try
{
webBrowser1.DocumentText = txt_htmlBody.Text.Replace("\n", String.Empty);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
运行程序时,将以下示例HTML复制到txt_htmlBody,然后单击右侧的图像并调试highLightElement。您将在我的评论中看到为什么我在搜索字符串中找不到指定的文本,因为WebBrowser
控件会重新排列属性。
<img src="images/PROMO-GREEN2_01_04.jpg" width="393" height="30" border="0" alt=""/>
有谁知道如何让WebBrowser控件按原样呈现我的HTML?
感谢您的时间。
答案 0 :(得分:1)
当您通过element.OuterHtml
获取时,您不能指望已处理的HTML与原始来源的1:1相同。无论渲染模式如何,它几乎都不会相同。
然而,尽管属性可能已经重新排列,但它们的名称和值仍然相同,因此您只需要改进搜索逻辑(例如,通过遍历DOM三或简单地通过{{3}枚举元素并通过HtmlDocument.All检查其属性。