Question

我试图从超链接中删除空格和点，所有规则都正常工作，除了它不从网址中删除点。这里有几个例子

 <a href="   http://www.example.com   ">example site</a>
 <a href="   http://www.example.com">example 2</a>
 <a href="http://www.example.com.">final example</a>


  $text = preg_replace('/<a href="([\s]+)?([^ "\']*)([\s]+)?(\.)?">([^<]*)<\/a>/', '<a href="\\2">\\5</a>', $text);

在最后一个例子中，RE应该从url中删除点。 Dot是可选的，所以我写了这个规则（。）？

Answer 1

<a href="([\s]+)?([^ "\']*\.[a-zA-Z]{2,5})([\s]+)?(\.)?">([^<]*)<\/a>怎么样？ .[a-zA-Z]{2,5}？

它会捕获.com，.info，.edu甚至类似.com.au

的内容

Answer 2

因为您的点已与([^ "\']*)组匹配。

将其更改为([^ "\']*?) - ungreedy版本。

此外，我建议您将([\s]+)?(\.)?替换为[\s.]*以处理“www.example.com。”字符串。

Answer 3

这将修剪hrefs（我认为你的意思是修剪它们）。

表示两个'"值分隔符（已展开）：

(<a \s+ href \s* = \s*)
(?|
     (") \s* ([^"]*?) [\.\s]* (")
  |  (') \s* ([^']*?) [\.\s]* (')
)
([^>]*>)

替换是：$1$2$3$4$5

或，

仅适用于"值分隔符（已展开）：

(<a \s+ href \s* = \s* ")
\s* 
([^"]*?)
[\.\s]*
(" [^>]*>)

替换是：$1$2$3

Answer 4

以下内容尚未经过测试。

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->Load('source.html');

$xpath = new DOMXPath($doc);

// We starts from the root element
$query = 'a';

$anchors = $xpath->query('a');

foreach($anchors as $aElement) {
    $aElement->setAttribute('href', trim($aElement->getAttribute('href'), ' .'));
}

$doc->saveHTMLFile('new-source.html');

从超链接剥离空白和点

4 个答案: