使用PHP解析HTML DOM

时间:2017-05-27 13:55:44

标签: php html parsing dom html-parsing

我想使用PHP将以下HTML中的记录读入变量。做这个的最好方式是什么?示例HTML代表两个记录。

记录1的结果:


    rank = 1
    tag = LLG8V2QQ
    name = Pat
    level = 11
    league = 1
    trophies = 4154
    donations = 578
    role = Elder

  <div class="clan__rowContainer">
    <div class="clan__row">
                        #1
                </div>
    <div class="clan__row">
      <a class="ui__blueLink" href='/profile/LLG8V2QQ'>Pat</a>
    </div>
    <div class="clan__row">
      <span class="clan__playerLevel">11</span>
    </div>
    <div class="clan__row">
      <div class="clan__leagueContainer">
                        <div class="league__1"></div>
                    </div>
    </div>
    <div class="clan__row">
      <div class="clan__cup">4154</div>
    </div>
    <div class="clan__row">578</div>
    <div class="clan__row">
         Elder
                </div>
  </div>


  <div class="clan__rowContainer">
    <div class="clan__row">
                        #2
                </div>
    <div class="clan__row">
      <a class="ui__blueLink" href='/profile/299GGR2J'>Erikson</a>
    </div>
    <div class="clan__row">
      <span class="clan__playerLevel">11</span>
    </div>
    <div class="clan__row">
      <div class="clan__leagueContainer">
                        <div class="league__1"></div>
                    </div>
    </div>
    <div class="clan__row">
      <div class="clan__cup">4081</div>
    </div>
    <div class="clan__row">248</div>
    <div class="clan__row">
         Member
                </div>
  </div>

2 个答案:

答案 0 :(得分:1)

您可以尝试使用PHP的DOMDocument类。这是文档link。 您可以从以下开始:

<?php
$doc = new DOMDocument();
$doc->loadHTMLFile("filename.html");
//or $doc->loadHTML("<html><body>Test<br></body></html>");

然后迭代doc节点及其子节点:

foreach ($doc->childNodes as $item) {
    //... some code
}

答案 1 :(得分:1)

到目前为止我的代码:

    <?php 
        $_document = implode('', file('http://myURL')); 

        $dom = new DomDocument; 
        $dom->loadHTML($_document); 
        $dom->preserveWhiteSpace = false; 
        $divs = $dom->getElementsByTagName('div'); 
        foreach ($divs as $div) {
            $class = $div->getAttribute('class');
            if ($class == 'clan__rowContainer') {
                NO IDEA WHAT NOW
            }
        }
    ?>