Question

我有一个包含html字符串的变量。该字符串具有此特定代码

<a href="http://www.pheedo.com/click.phdo?s=xxxxxxxx&amp;p=1"><img border="0" src="http://www.pheedo.com/img.phdo?s=xxxxxxxxxx&amp;p=1" style="border: 0pt none ;" alt=""/></a>

使用正则表达式，我该如何删除它。基本上寻找pheedo.com域，并剥离链接和图像标记。

由于

Answer 1

这是一个反答：不要用正则表达式操纵任意HTML！ HTML是一个非常复杂的规范，正确解析它可能是一场噩梦。

使用像phpQuery这样的库或内置的DOMDocument，他们知道如何处理HTML的所有奇怪之处。

Answer 2

对于更通用的方法，（文字/ html广告，同一域上的不同网址等），您可以尝试

<a.*href="[^"]*pheedo.com[^"]*".*</a>

只需替换您找到的任何匹配项。请注意，如果有孩子<a/>，您就会遇到问题。

Answer 3

这应该与标签匹配（用PHP编写）：

$regex = "#<a href=\"http:\/\/www\.pheedo\.com[^>]+><img[^>]+><\/a>#"

Answer 4

    $text = '<a href="http://www.pheedo.com/click.phdo?s=xxxxxxxx&amp;p=1"><img border="0" src="http://www.pheedo.com/img.phdo?s=xxxxxxxxxx&amp;p=1" style="border: 0pt none ;" alt=""/></a>';
    $reg = "/href=\"(http:\/\/\S+?)\"/i";
    preg_match_all($reg, $text, $matches, PREG_PATTERN_ORDER);

    // $matches[1] should now hold all the domain name "www.pheeedo.com"

我是这样做的，所以你可以将一个页面传递给preg并在数组中得到所有匹配的结果...... /

如果你感兴趣的话，我做了一些类似的东西来制作这个图像搜索工具。

http://www.iansimpsonarchitects.com/fraser

您可以从页面上的链接查看完整的PHP源代码。

F。

从html字符串中删除广告

4 个答案: