我需要以几种不同的方式处理html字符串中的链接。
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.'
$links = extractLinks($str);
foreach ($links as $link) {
$pattern = "#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie";
if (preg_match($pattern,$str)) {
// Process Remote links
// For example, replace url with short url,
// or replace long anchor text with truncated
} else {
// Process Local Links, Anchors
}
}
function extractLinks($str) {
// First, I tried DomDocument
$dom = new DomDocument();
$dom->loadHTML($str);
return $dom->getElementsByTagName('a');
// But this just returns:
// DOMNodeList Object
// (
// [length] => 3
// )
// Then I tried Regex
if(preg_match_all("|<a.*(?=href=\"([^\"]*)\")[^>]*>([^<]*)</a>|i", $str, $matches)) {
print_r($matches);
}
// But this didn't work either.
}
extractLinks($str)
的理想结果:
[0] => Array(
'str' = '<a href="http://example.com/abc" rel="link">string</a>',
'href' = 'http://example.com/abc';
'anchorText' = 'string'
),
[1] => Array(
'str' = '<a href="/local/path" title="with attributes">number</a>',
'href' = '/local/path';
'anchorText' = 'number'
),
[2] => Array(
'str' = '<a href="#anchor" data-attr="lots">links</a>',
'href' = '#anchor';
'anchorText' = 'links'
);
我需要所有这些,所以我可以做一些事情,比如编辑href(添加跟踪,缩短等),或用其他东西替换整个标记(<a href="/u/username">username</a>
可能变成username
)。
这是我尝试做的demo。
答案 0 :(得分:12)
您只需将其更改为:
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.';
$dom = new DomDocument();
$dom->loadHTML($str);
$output = array();
foreach ($dom->getElementsByTagName('a') as $item) {
$output[] = array (
'str' => $dom->saveHTML($item),
'href' => $item->getAttribute('href'),
'anchorText' => $item->nodeValue
);
}
通过将其置于循环中并使用getAttribute
,nodeValue
和saveHTML(THE_NODE)
,您将获得输出
答案 1 :(得分:4)
喜欢这个
<a\s*href="([^"]+)"[^>]+>([^<]+)</a>
使用preg_match($pattern,$string,$m)
数组元素将位于$m[0]
$m[1]
$m[3]
$string = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>. ';
$regex='|<a\s*href="([^"]+)"[^>]+>([^<]+)</a>|';
$howmany = preg_match_all($regex,$string,$res,PREG_SET_ORDER);
print_r($res);