在解析的表中更改/替换url的特定部分

时间:2018-04-04 19:15:54

标签: php dom simple-html-dom

我想通过simple_html_dom解析一个表。到现在为止还挺好。现在我想更改表格单元格内的所有链接。他们目前以" .htm"并且应该更改为" .php",因此链接指向相同的文件名,但是另一个文件类型。由于源文件的内容不断变化,因此必须独立于文件名。

示例:

<td><a href="www.website.com/site1.htm" ... --> <td><a href="www.website.com/site1.php"

这是当前代码:

// Download simple_html_dom.php first from http://simplehtmldom.sourceforge.net/
require_once('simple_html_dom.php');
// Get the contents of the HTML document either using cURL, a crawling
// framework, or use the provided file_get_html() function.
$html = file_get_html('mywebsite/example.htm');


// Table 1
    $table = $html->find('table', 1);
    $rowData = array();

    foreach($table->find('tr') as $row) {
        // initialize array to store the cell data from each row
        $flight = array();
        foreach($row->find('td') as $cell) {
            // push the cell's text to the array
            $flight[] = $cell->innertext;
        }
        foreach($row->find('th') as $cell) {
            // push the cell's text to the array
            $flight[] = $cell->innertext;
        }
        $rowData[] = $flight;
    }
    foreach ($rowData as $row => $tr) {
        echo '<tr>';
        foreach ($tr as $td)
            echo '<td>' . $td .'</td>';
        echo '</tr>';
    }

来源如下:

    table><hr>
<tr><th>po</th><th>player</th><th>age</th><th>2ga</th><th>2g%</th><th>fta</th><th>ft%</th><th>3ga</th><th>3g%</th><th>orb</th><th>drb</th><th>ast</th><th>stl</th><th>to</th><th>blk</th><th>o-o</th><th>d-o</th><th>p-o</th><th>t-o</th><th>o-d</th><th>d-d</th><th>p-d</th><th>t-d</th></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="JamesHarden7.htm">James Harden                    </a></td><td>27</td><td>48</td><td>53</td><td>95</td><td>85</td><td>85</td><td>35</td><td>20</td><td>59</td><td>99</td><td>57</td><td>1</td><td>12</td><td>4</td><td>9</td><td>7</td><td>9</td><td>8</td><td>6</td><td>5</td><td>7</td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="TerryRozier1.htm">Terry Rozier                    </a></td><td>22</td><td>31</td><td>41</td><td>15</td><td>77</td><td>43</td><td>32</td><td>18</td><td>42</td><td>31</td><td>46</td><td>79</td><td>8</td><td>5</td><td>4</td><td>4</td><td>2</td><td>6</td><td>5</td><td>4</td><td>6</td></tr>
<tr><td CLASS=tdp>SG</td><td CLASS=tdp><a href="DannyGreen6.htm">Danny Green  

and so on...

2 个答案:

答案 0 :(得分:0)

您可以使用find("td a")获取示例的锚点。

然后,您可以使用foreach循环显示结果,并将href属性的最后3个字符替换为php,例如使用substr_replace

$html = <<<HTML
 <table><hr>
<tr><th>po</th><th>player</th><th>age</th><th>2ga</th><th>2g%</th><th>fta</th><th>ft%</th><th>3ga</th><th>3g%</th><th>orb</th><th>drb</th><th>ast</th><th>stl</th><th>to</th><th>blk</th><th>o-o</th><th>d-o</th><th>p-o</th><th>t-o</th><th>o-d</th><th>d-d</th><th>p-d</th><th>t-d</th></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="JamesHarden7.htm">James Harden                    </a></td><td>27</td><td>48</td><td>53</td><td>95</td><td>85</td><td>85</td><td>35</td><td>20</td><td>59</td><td>99</td><td>57</td><td>1</td><td>12</td><td>4</td><td>9</td><td>7</td><td>9</td><td>8</td><td>6</td><td>5</td><td>7</td></tr>
<tr><td CLASS=tdp>PG</td><td CLASS=tdp><a href="TerryRozier1.htm">Terry Rozier                    </a></td><td>22</td><td>31</td><td>41</td><td>15</td><td>77</td><td>43</td><td>32</td><td>18</td><td>42</td><td>31</td><td>46</td><td>79</td><td>8</td><td>5</td><td>4</td><td>4</td><td>2</td><td>6</td><td>5</td><td>4</td><td>6</td></tr>
</table>
HTML;

$html = str_get_html($html);

foreach ($html->find("td a") as $a) {
    $a->href = substr_replace($a->href, 'php', -3);
}

答案 1 :(得分:0)

您可以使用DOMDocument代替并使用XPath查找对您网站的任何引用。此代码加载HTML并查找任何 href属性,该属性将网站作为URL的一部分。然后改变了 htm扩展到php。

$dom = new DOMDocument();
$dom->loadHTML($html);

$xp = new DOMXPath($dom);
$links = $xp->query("//@href[contains(.,'www.website.com')]");
foreach ( $links as $href )    {
    $href->nodeValue = str_replace(".htm", ".php", $href->nodeValue);
}

echo $dom->saveHTML();