$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->src;
}
什么都不给我回复
$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
}
返回我的相对网址,例如“/images/example.jpg
$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image.src;
}
回复我:
Fatal error: Call to undefined function getElementsByTagName()
那么,我怎样才能获得绝对路径?
答案 0 :(得分:1)
您可以使用parse_url查找基本网址:
$url = 'http://www.example.com/path?opt=234';
$parts = parse_url($url);
if (isset($parts['scheme'])){
$base_url = $parts['scheme'].'://';
} else {
$base_url = 'http://';
$parts = parse_url($base_url.$url);
}
$base_url .= $parts['host'];
if (isset($parts['path'])){
$base_url .= $parts['path'];
}
然后将其与您的代码结合使用,如下所示:
$html = file_get_contents("any site");
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $base_url.$image->getAttribute('src');
}
答案 1 :(得分:1)
此代码区分具有相对 URL的src
属性和完整 URL。它比简单的字符串连接更健壮,并处理相对路径不以斜杠开头的情况。 例如 images/image.jpg
与/images/image.jpg
。
<?php
$site = 'http://example.com/some/deeply/buried/page.html';
$dir = dirname($site);
$html = file_get_contents($site);
$dom = new domDocument;
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
// get the img src attribute
$img_path = $image->getAttribute('src');
// parse the path into its constituent parts
$url_info = parse_url($img_path);
// if the host part (or indeed any part other than "path") is set,
// then we're dealing with a fully qualified URL (or possibly an error)
if (!isset($url_info['host'])) {
// otherwise, get the relative path
$path = $url_info['path'];
// and ensure it begins with a slash
if (substr($path,0,1) !== '/') $path = '/'.$path;
// concatenate the site directory with the relative path
$img_path = $dir.$path;
}
echo $img_path; // this should be a full URL
}
?>
答案 2 :(得分:1)
它为我工作,也尝试一下
<?php
echo path_to_absolute(
"../images/example.jpg", /* image url */
"http://php.net/manual/en/" /* current page url */,
false /* is your url containing file name at the end like "http://server.com/file.html" */
);
function path_to_absolute( $src, $base = null, $has_filename = false ) {
if ( $has_filename && !in_array( substr( $src, 0, 1 ), array( "?", "#" ) ) ) {
$base = dirname( $base )."/";
}
else {
$base = rtrim( $base, "/" )."/";
}
if ( parse_url( $src, PHP_URL_HOST ) ) {
/* Its full url, so return it without modifying */
return $src;
}
if ( substr( $src, 0, 1 ) == "/" ) {
/* $src begin with a slash, find server host and, join it with $src */
return str_replace( parse_url( $base, PHP_URL_PATH ), "", $base ).$src;
}
/* remove './' from $src, we dont need it */
$src = ( substr( $src, 0, 2 ) === "./" ) ? substr( $src, 2, strlen( $src ) ) : $src;
/* check how many times we need to go back **/
$path = substr_count( $src, "../" );
$src = str_ireplace( "../", "", $src );
for( $i = 1; $i <= $path; $i++ ) {
if ( parse_url( dirname( $base ), PHP_URL_HOST ) ) {
$base = dirname( $base ) . "/";
}
}
return $base . $src;
}
?>
示例用法..
在这里我们找到php.net
的链接,因为有很多相对链接
<?php
$url = "http://www.php.net/manual/en/tokens.php";
$html = file_get_contents( $url );
$dom = new DOMDocument;
@$dom->loadHTML( $html );
$dom->preserveWhiteSpace = false;
$links = $dom->getElementsByTagName( 'a' );
foreach( $links as $link ) {
$original_url = $link->getAttribute( 'href' );
$absolute_url = path_to_absolute( $original_url, $url, true );
echo $original_url." - ".$absolute_url."\n";
}
/** prints...
* / - http://www.php.net/
* ...
* control-structures.while.php - http://www.php.net/manual/en/control-structures.while.php
* control-structures.do.while.php - http://www.php.net/manual/en/control-structures.do.while.php
* ...
* /sitemap.php - http://www.php.net/sitemap.php
* /contact.php - http://www.php.net/contact.php
* ...
* http://developer.yahoo.com/ - http://developer.yahoo.com/
* ...
* ?setbeta=1&beta=1 - http://www.php.net/manual/en/tokens.php?setbeta=1&beta=1
* ...
* #85872 - http://www.php.net/manual/en/tokens.php#85872
**/
?>
答案 3 :(得分:0)
我认为您应该将第二个解决方案与'any site'
的网址结合起来。因为图像的src标记可能只包含相对路径。从Web开发人员的角度来看,不需要包含绝对路径。