使用正则表达式计算元素

时间:2012-10-16 09:18:49

标签: php regex web-scraping

我想抓一个基于星的评级,即相应的代码

<div class="product_detail_info_rating_stars">
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star full"></div>
    <div class="product_detail_star"></div>
</div>

每个评级都有这个代码片段。我正在寻找一种方法将这些片段转换成数字,就像这样一个4(4星中的5星)。

我想到的方法是匹配每个评级的整个区块,然后匹配整个班级并计算它,但也许有一种更好的方式,我没有看到。

有没有更好的方法来解决这个问题?

谢谢!

1 个答案:

答案 0 :(得分:2)

以下是如何使用SimpleXML解析器和XPath的快速示例。

// Get your page HTML string
$html = file_get_contents('1page.htm');

// To suppress invalid markup warnings
libxml_use_internal_errors(true);

// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);

// Find a nodes
$blocks = $xml->xpath('//div[contains(@class, "product_detail_info_rating_stars")]');

foreach ($blocks as $block)
{
    $count = 0;
    foreach ($block->children() as $child) {
        if ($child['class'] == 'product_detail_star full') {
            $count++;
        }
    }
    echo '<pre>'; print_r('Rating: ' . $count . ' of ' . $block->count()); echo '</pre>';
}

// Clear invalid markup error buffer
libxml_clear_errors();

对于像这样的测试html页面:

<!doctype html>
<html>
<head></head>
<body>

<table>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
    <tr>
        <td>
            <div class="product_detail_info_rating_stars">
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star full"></div>
                <div class="product_detail_star"></div>
            </div>
        </td>
    </tr>
</table>

</body>
</html>

它将输出如下内容:

Rating: 1 of 5
Rating: 2 of 5
Rating: 4 of 5

使用它来调整您的需求。