使用正则表达式解析DOM

时间:2014-12-01 09:19:52

标签: php regex parsing dom

我有这个html代码块,我正在尝试使用“points”“stat-label”解析div中的内容。 我已经使用“stat-label”量为div做了这个并且它完美地工作。

preg_match('#\$[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?#', $xx1, $output1);
$parts1 = $output1[0];
$val1 = trim(str_replace('$','',$parts1));
$value1= preg_replace('/[\$,]/', '', $val1);

但我无法理解“点数”的价值。任何想法??

试过这个:

preg_match('/^\\d+(\\.\\d+)?$/D', $xx1, $output2);

结果是:

object(DOMNodeList)#7 (1) {
  ["length"]=>
  int(0)
}


<div class="widget">
    <div class="widget-header">

        <!-- content -->
    </div>

    <div class="widget-content">

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    <!-- content -->
                </div>
                <div class="stat-value">
                    <!-- content -->
                </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    <!-- content -->
                </div>
                <div class="stat-value">
                    <!-- content -->
                </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    <!-- content -->
                </div>
                <div class="stat-value">
                    <!-- content -->
                </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    Amount
                </div>
                <div class="stat-value">
                    <font color="green">$</font>123,456,678,012 </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    Points
                </div>
                <div class="stat-value">
                    12.14 </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    <!-- content -->
                </div>

                <div class="stat-value">
                    <!-- content -->
                </div>
            </div>
        </div>
        <hr>

        <div class="stat">
            <div class="stat-header">
                <div class="stat-label">
                    <!-- content -->
                </div>
                <div class="stat-value">
                    <!-- content -->
                </div>
            </div>
        </div>
    </div>
</div>

2 个答案:

答案 0 :(得分:0)

12.14由您的RE未预期的空格括起来。要么trim()之前,要么不使用^$

答案 1 :(得分:0)

所以,在看了PHP的dom解析潜力之后,我已经放弃了使用正则表达式来解析html的所有实例。

以下是我解决上述问题的方法:

<?php
$login_data=  http_build_query(array('username'=>$username,'password'=>$password));
$html = _curl("http://example.com/getinfo.php",'POST',$login_data); // this is a curl function I use

$dom = new DOMDocument();
$dom->loadHTML($html);
$els = $dom->getElementsByTagName('*');
$child = 0;
$myAmount = 0;
foreach ( $els as $el ) {
    $firstChild = $el->firstChild;
    $child++;
if($child == "96"){   // this was the firstChild that has the amount data
$myAmount = trim($firstChild->wholeText);
}

}

echo $myAmount; // outputs 12.14! 

?>

所以,它至少回答了我。请参阅上面评论中的链接。