Question

我正在尝试将所有图像链接与preg_match_all以http://i.ebayimg.com/开头并以.jpg结尾，从我正在抓取的页面...我无法正确执行... :(我试过这个，但这不是我需要的......：

preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $contentas, $img_link);

同样的问题是普通链接...我不知道怎么写preg_match_all到这个：

<a class="link--muted" href="http://suchen.mobile.de/fahrzeuge/details.html?id=218930381&daysAfterCreation=7&isSearchRequest=true&withImage=true&scopeId=C&categories=Limousine&damageUnrepaired=NO_DAMAGE_UNREPAIRED&zipcode=&fuels=DIESEL&ambitCountry=DE&maxPrice=11000&minFirstRegistrationDate=2006-01-01&makeModelVariant1.makeId=3500&makeModelVariant1.modelId=20&pageNumber=1" data-touch="hover" data-touch-wrapper=".cBox-body--resultitem">

非常感谢!!!

更新我在这里尝试： http://suchen.mobile.de/fahrzeuge/search.html?isSearchRequest=true&scopeId=C&makeModelVariant1.makeId=1900&makeModelVariant1.modelId=10&makeModelVariant1.modelDescription=&makeModelVariantExclusions%5B0%5D.makeId=&categories=Limousine&minSeats=&maxSeats=&doorCount=&minFirstRegistrationDate=2006-01-01&maxFirstRegistrationDate=&minMileage=&maxMileage=&minPrice=&maxPrice=11000&minPowerAsArray=&maxPowerAsArray=&maxPowerAsArray=PS&minPowerAsArray=PS&fuels=DIESEL&minCubicCapacity=&maxCubicCapacity=&ambitCountry=DE&zipcode=&q=&climatisation=&airbag=&daysAfterCreation=7&withImage=true&adLimitation=&export=&vatable=&maxConsumptionCombined=&emissionClass=&emissionsSticker=&damageUnrepaired=NO_DAMAGE_UNREPAIRED&numberOfPreviousOwners=&minHu=&usedCarSeals=获取汽车链接和图像链接以及所有信息，信息一切正常，我的脚本运行良好，但我有抓图像和链接的问题..这是我的脚本：

<?php

        $id= $_GET['id'];
        $user= $_GET['user'];
        $login=$_COOKIE['login'];

    $query = mysql_query("SELECT pavadinimas,nuoroda,kuras,data,data_new from mobile where vartotojas='$user' and id='$id'");
    $rezultatas=mysql_fetch_row($query);

    $url = "$rezultatas[1]";

    $info = file_get_contents($url); 

function scrape_between($data, $start, $end){
$data = stristr($data, $start); 
$data = substr($data, strlen($start));
$stop = stripos($data, $end);
$data = substr($data, 0, $stop);
return $data;
  }
     //turinio iskirpimas
    $turinys = scrape_between($info, '<div class="g-col-9">', '<footer class="footer">');
     //filtravimas naikinami mokami top skelbimai
    $contentas = preg_replace('/<div class="cBox-body cBox-body--topResultitem".*?>(.*?)<\/div>/', '' ,$turinys);
    //filtravimas baigtas

      preg_match_all('/<span class="h3".*?>(.*?)<\/span>/',$contentas,$pavadinimas); 

      preg_match_all('/<span class="u-block u-pad-top-9 rbt-onlineSince".*?>(.*?)<\/span>/',$contentas,$data); 

      preg_match_all('/<span class="u-block u-pad-top-9".*?>(.*?)<\/span>/',$contentas,$miestas);

      preg_match_all('/<span class="h3 u-block".*?>(.*?)<\/span>/', $contentas, $kaina);

      preg_match_all('/<a[A-z0-9-_:="\.\/ ]+href="(http:\/\/suchen.mobile.de\/fahrzeuge\/[^"]*)"[A-z0-9-_:="\.\/ ]\s*>\s*<div/s', $contentas, $matches);

   print_r($pavadinimas);
   print_r($data);
   print_r($miestas);
   print_r($kaina);
   print_r($result);
   print_r($matches);

   ?>

Answer 1

1。要从所有src标记的http://i.ebayimg.com/开始捕获img属性：

正则表达式：/src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i

以下是一个例子：

$re = "/src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i"; 
$str = "codeOfHTMLPage"; 
preg_match_all($re, $str, $matches);

现场查看：here

如果您想确保在img代码上捕获此网址，请使用此正则表达式（请注意，如果网页很长，性能会降低）：

$re = "/<img(?:.*?)src=\"((?:http|https):\\/\\/i.ebayimg.com\\/.+?.jpg)\"/i";

2。要从所有href标记的http://i.ebayimg.com/开始捕获a属性：

正则表达式：/href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i

以下是一个例子：

$re = "/href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i; 
$str = "codeOfHTMLPage"; 
preg_match_all($re, $str, $matches);

现场查看：here

如果您想确保在a代码上捕获此网址，请使用此正则表达式（请注意，如果网页很长，性能会降低）：

$re = "/<a(?:.*?)href=\"((?:http|https):\\/\\/suchen.mobile.de\\/fahrzeuge\\/.+?.jpg)\"/i";

Answer 2

DOMDocument更方便：

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($yourURL);

$imgNodes = $dom->getElementsByTagName('img');

$result = [];

foreach ($imgNodes as $imgNode) {
    $src = $imgNode->getAttribute('src');
    $urlElts = parse_url($src);
    $ext = strtolower(array_pop(explode('.', $urlElts['path'])));
    if ($ext == 'jpg' && $urlElts['host'] == 'i.ebayimg.com')
        $result[] = $src;
}

print_r($result);

得到正常的＆＃34;链接，使用相同的方式（DOMDocument + parse_url）。

preg_match_all如何获取所有链接？

2 个答案: