尝试获取此HTML的href
值
<a class="list-item clearfix" href="/en/rolex/submariner-date--id2334149.htm" id="watch-2334149" style="background-color: rgb(255, 255, 255);">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-image-click']);_gaq.push(['second._trackEvent','Click','search','watch-image-click']);" class="pic ">
<span style="position:absolute">
<img width="100" height="100" alt="Rolex Submariner Date" src="" class="photo">
</span>
</span>
<span class="disc">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-headline-click']);_gaq.push(['second._trackEvent','Click','search','watch-headline-click']);" class="watch-headline"><span class="underline">Rolex Submariner Date</span></span>
<span class="spec">
<span onmouseover="$('#infobox-title').text('Germany');$('#infobox-text').text('This dealer is from Augsburg, Germany.')" style="width: 21px;" class="flag">
<img width="16" height="16" alt="" src="http://cdn.chrono24.com/images/flags-icons/DE.png">
</span>
<span class="icon i-hasnostore"></span>
<span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="icon i-trusted"></span>
<span onmouseover="$('#infobox-title').text('Retailer recommendations');$('#infobox-text').text('This watch retailer is recommended on Chrono24 by 1 other watch retailers.')" class="i-buddies">
<span class="icon buddie-count">1</span>
<span class="icon i-star-blue"></span>
</span>
<span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="trustedseller">
<script type="text/javascript">
// <![CDATA[
document.write('Trusted Seller since 2004');
// ]]>
</script>Trusted Seller since 2004
</span>
<span style="width: 2px;" class="icon"></span>
<span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="icon i-premium"></span>
<span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="premiumseller">Premium</span>
</span>
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-desc-click']);_gaq.push(['second._trackEvent','Click','search','watch-desc-click']);" class="description">
Ref. No. 116610 LN; Steel; Automatic; Condition 0 (unworn); Year 2013; With Box; With Papers; Location: Germany, Augsburg; The current, the manufacturer's recommended retail price is 6800 Euro
</span>
<span class="availability">Availability: Available immediately</span>
</span>
<span class="pricebox">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-price-click']);_gaq.push(['second._trackEvent','Click','search','watch-price-click']);" class="amount price"><span class="large">$ 7,961</span>
</span>
<span class="buttonbox">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-button-click']);_gaq.push(['second._trackEvent','Click','search','watch-button-click']);" class="button-blue">
<span>
Watch details
</span>
</span>
</span>
</span>
</a>
preg_match_all('#<a href="(.+)">#',$html,$urlarr);
这根本没有提供href
值,不知道这有什么问题。
答案 0 :(得分:2)
Don't use Regular Expressions on HTML; HTML is not regular!
你应该看看SimpleXML和XPath,它们是完成这项工作的最佳选择:http://php.net/manual/en/simplexmlelement.xpath.php
E.g:
$xml = new SimpleXMLElement($html);
// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");
// You probably want the first one
$href = $links[0]["href"]
答案 1 :(得分:1)
如果是regexp:
,则应使用domdocument $dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$link = $dom->getElementsByTagName("a");
$links = array();
for($i = 0; $i < $link->length; $i++) {
$links[] = $link->item($i)->getAttribute("href");
}
答案 2 :(得分:1)
所有使用DOM的方法都应该有效。如果你想使用正则表达式,你可以试试这个:
preg_match_all('~<a (?>[^>h]++|\Bh|h(?!ref\b))*href\s*=\s*["\']?\K[^"\'>\s]++~i', $html, $matches);
如果您只想匹配具有list-item clearfix
作为类属性值的标记中的href,则可以执行以下操作:
$pattern = <<<'LOD'
~
(?(DEFINE)
(?<class> \b class \s* = \s* (["']) list-item \s+ clearfix \g{-1} )
(?<href_value> [^"'\s>]++ )
(?<href_start> \b href \s*=\s* ["']? )
(?<href_end> ['"\s] )
(?<content> (?> [^>hc]++ | \B[hc] | h(?!ref\b) | c(?!lass\b) )* )
)
<a \s+
\g<content>
(?J)
(?>
\g<class> \g<content> \g<href_start> (?<href> \g<href_value> )
|
\g<href_start> (?<href> \g<href_value> ) \g<href_end> \g<content> \g<class>
)
~xi
LOD;
preg_match_all($pattern, $html, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
echo '<br>' . $match['href'];
}
请记住,使用XPath要容易得多:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$hrefs = $xpath->query("//a[@class='list-item clearfix']/@href");
foreach($hrefs as $href) {
print_r($href->nodeValue);
}
答案 3 :(得分:0)
使用正则表达式解析HTML是一个坏主意(至少在这种情况下)。为此目的使用SimpleHTMLDOM等DOMParser:
这很容易:
$html = str_get_html('...');
foreach($html->find('a') as $element)
echo $element->href;
或者,您也可以从文件中加载它:
$html = file_get_html('...');
foreach($html->find('a') as $element)
echo $element->href;
使用内置DOM也可以这样做:
$dom = new DOMDocument();
$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a"); //all <a> tags
$urlArray = array();
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$urlArray[] = $href->getAttribute('href');
}