我正在尝试从所有选项元素或具有以下内容的特定选择标记中获取产品尺寸:
<select id="prodSize" name="prodSize">
<option value="9274">10D</option>
<option value="9275">10DD</option>
<option value="9276">10E</option>
<option value="9277">10F</option>
<option value="9279">10G</option>
<option value="9288">12D</option>
<option value="9289">12DD</option>
<option value="9290">12E</option>
<option value="9291">12F</option>
<option value="9301">14D</option>
<option value="9302">14DD</option>
<option value="9303">14E</option>
<option value="9304">14F</option>
<option value="9305">14FF</option>
<option value="9315">16D</option>
<option value="9317">16E</option>
<option value="9318">16F</option>
<option value="9319">16FF</option>
<option value="9320">16G</option>
</select>
我尝试在chrome dev工具中使用$x("//select[@id='prodSize']/option/text()")
并且它没有任何问题地返回给我所有的值,但是当我想用DOMXPath来获取它时:
$options = $xpath->query("//select[@id='prodSize']/option/text()");
或:
$options = $xpath->query("*/select[@id='prodSize']/option");
我明白了:
object(DOMNodeList)#40 (1) { ["length"]=> int(0) } object(DOMNodeList)#29 (1) { ["length"]=> int(0) }
object(DOMNodeList)#39 (1) { ["length"]=> int(0) } object(DOMNodeList)#41 (1) { ["length"]=> int(0) }
为了清晰起见,我添加了完整代码:
scrapCatUrl('http://.../shop-management/categories/maternity-lingerie.aspx', "//ul[@class='lvl2 visible']/li/a/@href");
function scrapCatUrl($path, $query){
$xpath = scrap($path);
$links = $xpath->query($query);
foreach($links as $link){
echo 'Category'.' - '.$url.$link->nodeValue . '<br>';
scrapProdUrl($url.$link->nodeValue);
}
}
function scrapProdUrl($path){
$xpath = scrap($path);
$links = $xpath->query("//a[@class='thumbObj']/@href");
$i = 0;
foreach($links as $link){
echo 'Product'.' - '.$url.$link->nodeValue . '<br>';
getProdData($url.$link->nodeValue);
if($i > 2){
die();
}
$i++;
}
}
function getProdData($path){
$xpath = scrap($path);
$description = $xpath->query("//meta[@name='description']/@content");
$keywords = $xpath->query("//meta[@name='keywords']/@content");
$title = $xpath->query("//h4[@class='h4-productdetail']/text()");
$price = $xpath->query("//div[@class='productDetail']/span[@class='price']/text()");
$images = $xpath->query("//div[@class='imgs']/img/@src");
$fullDescription = $xpath->query("//div[@class='flash']/following-sibling::div[@class='clearer']/preceding-sibling::text()[preceding-sibling::div[@class='flash']]");
$options = $xpath->query("//select[@id='prodSize']/option/text()");
echo 'Meta Description'.' - '.$description->item(0)->nodeValue. '<br>';
echo 'Meta Keywords'.' - '.$keywords->item(0)->nodeValue. '<br>';
echo 'Title'.' - '.$title->item(0)->nodeValue. '<br>';
echo 'Price'.' - '.$price->item(0)->nodeValue. '<br>';
if($images->length > 1){
foreach($images as $image){
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
}
else{
echo '<img src="'.$url.$image->nodeValue.'" />'. '<br>';
}
foreach($options as $option){
echo $option->nodeValue;
}
}
function scrap($path){
$ch = curl_init($path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($page);
$xpath = new DOMXpath($dom);
return $xpath;
}
我尝试了一些人们在这里建议但得到相同结果的方法。除了这个之外,我从页面,标题,图像,描述中获取任何其他元素都没有问题。