Question

我从其他页面加载HTML以从该页面中提取和显示数据：

$.get('http://example.org/205.html', function (html) {
    console.log( $(html).find('#c1034') );
});

这确实有效，但由于$(html)我的浏览器尝试加载205.html中链接的图片。我的域上不存在这些图像，因此我收到了很多404错误。

有没有办法解析像$(html)这样的页面但是没有将整个页面加载到我的浏览器中？

Answer 1

使用正则表达式并删除所有<img>标记

 html = html.replace(/<img[^>]*>/g,"");

Answer 2

实际上，如果您查看jQuery documentation，它表示您可以传递＆＃34;所有者文档＆＃34;作为$的第二个参数。

那么我们可以做的是create a virtual document，以便浏览器不会自动加载提供的HTML中存在的图像：

var ownerDocument = document.implementation.createHTMLDocument('virtual');
$(html, ownerDocument).find('.some-selector');

Answer 3

使用以下方法解析html将自动加载图像。

var wrapper = document.createElement('div'),
    html = '.....';
wrapper.innerHTML = html;

如果使用DomParser来解析html，则不会自动加载图像。有关详细信息，请参阅https://github.com/panzi/jQuery-Parse-HTML/blob/master/jquery.parsehtml.js。

Answer 4

很抱歉复原旧问题，但这是搜索如何尝试停止解析html加载外部资产时的第一个结果。

我接受了Nik Ahmad Zainalddin的回答，但是它有一个弱点，因为<script>标签之间的任何元素都被消灭了。

<script>
</script>
Inert text
<script>
</script>

在上面的示例中，Inert text将与脚本标记一起删除。我最终做了以下事情：

html = html.replace(/<\s*(script|iframe)[^>]*>(?:[^<]*<)*?\/\1>/g, "").replace(/(<(\b(img|style|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g, "");

此外，我添加了删除iframe的功能。

希望这有助于某人。

Answer 5

您可以使用jQuerys remove()方法选择图像元素

console.log( $(html).find('img').remove().end().find('#c1034') );

或从HTML字符串中删除。像

这样的东西

console.log( $(html.replace(/<img[^>]*>/g,"")) );

关于背景图片，你可以这样做：

$(html).filter(function() {
    return $(this).css('background-image') !== ''; 
}).remove();

Answer 6

以下正则表达式替换了<head>, <link>, <script>, <style>的所有出现，包括来自ajax load返回的数据字符串的background和style属性。

html = html.replace(/(<(\b(img|style|script|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g,"");

测试正则表达式：https://regex101.com/r/nB1oP5/1

我希望有一种更好的解决方法（除了使用正则表达式替换）。

Answer 7

您可以使用以下正则表达式来删除所有src属性，而不是完全删除所有img元素：

html = html.replace(/src="[^"]*"/ig, "");

jQuery解析HTML而不加载图像

7 个答案: