带有DOM Xpath的PHP - 删除childNode并排列字符串

时间:2016-02-01 15:26:13

标签: php domxpath

我有这个html结构:

wkWebView.loadFileURL("file:///var/mobile/Containers/Bundle/Application/756EF089-4B8B-4F86-B834-9DFB8532A030/oneChat.app/www/index.html", allowingReadAccessToURL:"file:///var/mobile/Containers/Bundle/Application/0DDD77E6-2B05-4D99-AD9D-114F53B565A2/oneChat.app/")

现在,我有了这个XPath代码:

<html>
  <body>
    <section>
      <div>
        <div>
          <section>
            <div>
              <table>
                <tbody>
                  <tr></tr>
                  <tr>
                    <td></td>
                    <td></td>
                    <td>
                      <i></i>
                      <div class="first-div class-one">
                        <div class="second-div"> soft </div>
                        130 cm / 15cm
                      </div>
                    </td>
                  </tr>
                  <tr></tr>
                </tbody>
              </table>
            </div>
          </section>
        </div>
      </div>
    </section>
  </body>
</html>

这让我变得柔软130厘米/ 15厘米&#39;结果。

但我想知道如何只获得&#39; 15&#39;所以我需要:

1。要知道如何摆脱childNode-&gt; nodeValue

2。一旦我有130厘米/ 15厘米,就知道如何只获得&#39; 15&#39;作为PHP中变量的nodeValue。

你能帮忙吗? 提前致谢

1 个答案:

答案 0 :(得分:1)

标记内的文本也是节点(子节点),尤其是DOMText。 通过查看div的孩子,您可以找到DOMText并获取其nodeValue。以下示例:

$doc = new DOMDocument();
$doc->loadHTML("<html><body><p>bah</p>Test</body></html>");
echo $doc->saveHTML();

$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body' );
foreach ( $nodelist as $node ) {
    if ($node->childNodes)
            foreach ($node->childNodes as $child) {
                    if($child instanceof DOMText)
                            echo $child->nodeValue."\n"; // should output "Test".
            }
}

您的第二点可以通过正则表达式轻松完成:

$string = "130 cm / 15cm";

$matches = array();
preg_match('|/ ([0-9]+) ?cm$|', $string, $matches);

echo $matches[1];

完整解决方案:

<?php

$strhtml = '
<html>
  <body>
    <section>
      <div>
        <div>
          <section>
            <div>
              <table>
                <tbody>
                  <tr></tr>
                  <tr>
                    <td></td>
                    <td></td>
                    <td>
                      <i></i>
                      <div class="first-div class-one">
                        <div class="second-div"> soft </div>
                        130 cm / 15cm
                      </div>
                    </td>
                  </tr>
                  <tr></tr>
                </tbody>
              </table>
            </div>
          </section>
        </div>
      </div>
    </section>
  </body>
</html>';

$doc = new DOMDocument();
@$doc->loadHTML($strhtml);
echo $doc->saveHTML();

$xpath = new DOMXPath( $doc );
$nodelist = $xpath->query( '/html/body/section/div/div/section/div/table/tbody/tr[2]/td[3]/div' );
foreach ( $nodelist as $node ) {
    if ($node->childNodes)
        foreach ($node->childNodes as $child) {
            if($child instanceof DOMText && trim($child->nodeValue) != "")
            {
                echo 'Raw: '.trim($child->nodeValue)."\n";
                $matches = array();
                preg_match('|/ ([0-9]+) ?cm$|', trim($child->nodeValue), $matches);
                echo 'Value: '.$matches[1]."\n";
            }
       }
}