我已经看到了一些答案(比如this one),但我有一些更复杂的情况,我不知道如何解释。
我基本上拥有完整的HTML文档。我需要用绝对URL替换每个相对URL。
来自潜在HTML的元素如下所示,也可能是其他情况:
<img src="/relative/url/img.jpg" />
<form action="/">
<form action="/contact-us/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" />
期望的输出将是:
// "//example.com/" is ideal, but "http(s)://example.com/" are acceptable
<img src="//example.com/relative/url/img.jpg" />
<form action="//example.com/">
<form action="//example.com/contact-us/">
<a href='//example.com/relative/url/'>Note the Single Quote</a>
<img src="//example.com/protocol-relative-img.jpg" /> <!-- Unmodified -->
我不想替换协议相对URL,因为它们已经充当绝对URL。我想出了一些有用的代码,但是我想知道我是否可以将它清理一下,因为它非常>非常重复。
但我必须考虑src
,href
和action
的单引号和双引号属性值(我是否缺少可以拥有相对网址的任何属性?)同时避免协议相对URL。
这是我到目前为止所拥有的:
// Make URL replacement protocol relative to not break insecure/secure links
$url = str_replace( array( 'http://', 'https://' ), '//', $url );
// Temporarily Modify Protocol-Relative URLS
$str = str_replace( 'src="//', 'src="::TEMP_REPLACE::', $str );
$str = str_replace( "src='//", "src='::TEMP_REPLACE::", $str );
$str = str_replace( 'href="//', 'href="::TEMP_REPLACE::', $str );
$str = str_replace( "href='//", "href='::TEMP_REPLACE::", $str );
$str = str_replace( 'action="//', 'action="::TEMP_REPLACE::', $str );
$str = str_replace( "action='//", "action='::TEMP_REPLACE::", $str );
// Replace all other Relative URLS
$str = str_replace( 'src="/', 'src="'. $url .'/', $str );
$str = str_replace( "src='/", "src='". $url ."/", $str );
$str = str_replace( 'href="/', 'href="'. $url .'/', $str );
$str = str_replace( "href='/", "href='". $url ."/", $str );
$str = str_replace( 'action="/', 'action="'. $url .'/', $str );
$str = str_replace( "action='/", "action='". $url ."/", $str );
// Change Protocol Relative URLs back
$str = str_replace( 'src="::TEMP_REPLACE::', 'src="//', $str );
$str = str_replace( "src='::TEMP_REPLACE::", "src='//", $str );
$str = str_replace( 'href="::TEMP_REPLACE::', 'href="//', $str );
$str = str_replace( "href='::TEMP_REPLACE::", "href='//", $str );
$str = str_replace( 'action="::TEMP_REPLACE::', 'action="//', $str );
$str = str_replace( "action='::TEMP_REPLACE::", "action='//", $str );
我的意思是,它有效,但它是 uuugly ,我认为可能有更好的方法。
答案 0 :(得分:3)
我认为<base>
元素就是你要找的......
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base
<base>
是<head>
中的空元素。使用<base href="https://example.com/path/" />
会告诉文档中的所有相对网址引用https://example.com/path/
而不是父网址
答案 1 :(得分:2)
如果我理解正确,您会考虑基本值,并且只想将其应用于相对路径。
代码:(Demo)
$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;
$base='https://example.com';
echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);
输出:
<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
模式细分:
~ #Pattern delimiter
(?:src|action|href) #Match: src or action or href
= #Match equal sign
[\'"] #Match single or double quote
\K #Restart fullstring match (discard previously matched characters
/ #Match slash
(?!/) #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]* #Match zero or more non-single/double quote characters
~ #Pattern delimiter