我正在使用简单的html dom废弃网站数据,但是在将相对网址转换为绝对网址时出现问题...想象直接网页链接是http://www.example.com/tutorial.html但是当我得到我想要的内容时,有相关的链接,我希望他们都是绝对的。例如:
, 'w') as output:
...
我希望得到类似的东西:
$string = "<p>this is text within string</p> and more random strings which contains link like <a href='docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='files/doc.doc'>This file</a>";
只是在保留$string = "<p>this is text within string</p> and more random strings which contains link like <a href='http://www.example.com/docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='http://www.example.com/files/doc.doc'>This file</a>";
内容的同时将所有相对网址转换为绝对网址。
当尝试下面给出的解决方案时,对于报废的实际数据不起作用..
$string
答案 0 :(得分:1)
您使用preg_replace是正确的,您可以尝试使用此代码
// [^>]* means 0 or more quantifiers except for >
// single quote AND double quote support
$regex = '~<a([^>]*)href=["\']([^"\']*)["\']([^>]*)>~';
// replacement for each subpattern (3 in total)
// basically here we are adding missing baseurl to href
$replace = '<a$1href="http://www.example.com/$2"$3>';
$string = "<p>this is text within string</p> and more random strings which contains link like <a href='docs/555text.fileextension'>Download this file</a> <p>Other html follows where another relative link may exist like <a href='files/doc.doc'>This file</a>";
$replaced = preg_replace($regex, $replace, $string);
结果
<p>this is text within string</p> and more random strings which contains link like <a href="http://www.example.com/docs/555text.fileextension">Download this file</a> <p>Other html follows where another relative link may exist like <a href="http://www.example.com/files/doc.doc">This file</a>