如何解析html以将元素存储为数组

时间:2015-01-30 13:17:57

标签: php html regex parsing dom

我在下面有一个HTML文字。

<div class="wwrcm-tab-info wwrcm-cf wwrcm-last">
  <div class="wwrcm-info">
    <h2 class="wwrcm-text-gray">Instant office.</h2>
    <p class="wwrcm-text-gray-light">Just click your Surface Pro 3 into the dock to go from tablet to full desktop PC. With an Ethernet port, Mini DisplayPort and five USB ports – three USB 3.0 and two USB 2.0 ports – you can attach your HD monitor, full-size keyboard, printer and more.</p>
    <h2 class="wwrcm-text-gray">All powerful.</h2>
    <p class="wwrcm-text-gray-light">Docking Station delivers plenty of power at 48W. You can work on your device, run or charge your favourite accessories, and still have ample power to charge your Surface Pro 3 battery.</p>
    <h2 class="wwrcm-text-gray">Product Features</h2>
    <p class="wwrc-feature-p wwrcm-text-gray-light"><strong>Mini DisplayPort Video Output</strong><br/>The mini DisplayPort connection delivers high-definition video resolution of up to 3840 x 2600 DPI.</p>
    <p class="wwrc-feature-p wwrcm-text-gray-light"><strong>USB Ports</strong><br/>Docking Station includes five USB ports – three USB 3.0 and two USB 2.0 ports. Transfer large files to an external drive, plug in a USB printer or headset, charge multiple accessories, and more.</p>
    <p class="wwrc-feature-p wwrcm-text-gray-light"><strong>Gigabit Ethernet Port</strong><br/>The gigabit Ethernet connection is super fast, with data transfer rates of up to 1 billion bits per second&#185;.</p>
    <p class="wwrcm-text-gray-light"><strong>48W Power Supply</strong><br/>The 48W power supply quickly recharges your Surface battery while you work, so you can hit the road or the halls in no time with a fully-charged device.</p> 
    <h2 class="wwrcm-text-gray">Summary</h2>
    <ul class="wwrcm-text-gray-light">
      <li>Transform your Surface Pro 3 into a complete desktop workstation</li>
      <li>Connect to your favourite accessories</li>
      <li>Power and charge your Surface Pro 3</li>
    </ul>
  </div>
</div>

我想解析上面的html并显示h2值,然后依次显示p值。我想将其存储为数组h2作为键和<P>作为价值。

我尝试使用xpath->query和regualr表达式,但无法显示。

你能告诉我如何处理

1 个答案:

答案 0 :(得分:1)

尝试http://simplehtmldom.sourceforge.net/

`

$arr = [];
foreach($html->find('h1') as $header) {
    $nextSibling = $header->nextSibling();
        if (!empty($nextSibling) and $nextSibling->tag === 'p') {
            $arr[$header->plaintext] = $nextSibling->plaintext;
        }
}

`