Question

请参阅底部的编辑：

我正在使用XPath从网站上抓取一些数据。我想知道我是否可能使用了太多foreach() loops，并且可以以更简单的方式遍历层次结构。我觉得我可能会使用太多查询，并且可能有更好的方法只使用一个

层次结构看起来像这样。

<ul class='item-list'>
    <li class='item' id='12345'>
        <div class='this-section'>
            <a href='http://www.thissite.com'>
                <img src='http://www.thisimage.com/image.png' attribute_one='4567' attribute-two='some-words' />

        </div>
        <small class='sale-count'>Some Number</small>
    </li>
    <li class='item' id='34567'>
    <li class='item' id='48359'>
    <li class='item' id='43289'>
</ul>

所以我做了以下事情：

$dom = new DOMDocument;
@$dom->loadHTMLFile($file);
$xpath = new DOMXPath($dom);

$list = $xpath->query("//ul[@class='item-list']/li");

foreach($list as $list_item)
{
$item['item_id'][] = $list_item->getAttribute('id');

$links = $xpath->query("div[@class='this-section']//a[contains(@href, 'item')]", $list_item);

foreach($links as $address)
{
    $href = $address->getAttribute('href');
    $item['link'][] = substr($href, 0, strpos($href, '?'));
}

$other_data = $xpath->query("div[@class='this-section']//*[@attribute-one]", $list_item);

foreach($other_data as $element)
{
    $item['cost'][] = $element->getAttribute('atribute-one');
    $item['category'][] = $element->getAttribute('attribute-two');
    $item['name'][] = $element->getAttribute('attribute-three');        

}

$sales = $xpath->query(".//small[@class='sale-count']", $list_item);

foreach($sales as $sale)
    $item['sale'][] = substr($sale->textContent, 0, strpos($sale->textContent, ' '));
 }

我是否需要不断重新查询以在层次结构中工作，或者是否有更简单的方法来实现此目标？

修改所以我似乎确实使用了太多的foreach循环。对于我拿走的每一个人，我都节省了大量的记忆。所以我的问题就变成了。

我有一个父元素（在这种情况下是<li>），是否有办法在不重新查询和循环结果的情况下选择元素和属性？我需要尽可能多地消除这些xpath子查询和foreach循环。

Answer 1

当然，您可以改为使用DOMElement::getElementsByTagName()：

$images = $list_item->getElementsByTagName( 'img');

至于哪个更有效，你必须对它进行基准测试。您可以在相对XPath查询或<li>节点树的前序遍历之间进行速度比较。

最有效的横向方式

1 个答案: