使用Simple_html_dom.php编辑html

时间:2013-11-19 15:05:24

标签: php web-scraping simple-html-dom

我正在使用simple_html_dom.php来搜索和编辑/操作以下内容:

<?php
include('simple_html_dom.php');
$_GET["name"];

$html_code="https://hwb.wales.gov.uk/Home/Pages/Home.aspx";
$html_code= $html_code.$name."/?lang=en";

echo $html_code;


$html = file_get_html($html_code);

echo "<html>";
echo "<head>";
echo "<meta charset='UTF-8'>";
echo  "<title>PHP Test</title>";
echo " </head>";
echo " <body>";


foreach($html->find('#LatestNewsArts') as $e)
   // Code here to append hwb.wale.gov.uk to <img src="/   >
  echo $e->innertext . '<br>';

echo " </body>";
echo "</html>";

?>

我可以提取我正在寻找的<div> - 并回应它 - 这很好。

我撞墙的地方(我的.php-fu让我失望)是如何拦截和编辑我已经删除的e $中的html?

我要做的是将<img src="/....">标记替换为<img src="hwb.wales.gov.uk/....">

1 个答案:

答案 0 :(得分:0)

可以像下面这样轻松地为属性设置新值:$elmt->attribute = NewValue

这是一个回答你问题的工作代码:

// includes Simple HTML DOM Parser
include "simple_html_dom.php";

$html_code="https://hwb.wales.gov.uk/Home/Pages/Home.aspx";

// => I dont know what $name stands fore... It's up to you to change this code to suit your needs
//$html_code= $html_code.$name."/?lang=en";

echo $html_code;

$html = file_get_html($html_code);

echo "<html>";
echo "<head>";
echo "<meta charset='UTF-8'>";
echo  "<title>PHP Test</title>";
echo " </head>";
echo " <body>";

// Loop through all divs with id="Article"
foreach($html->find('#LatestNewsArts #Article') as $e){
    $url = "https://hwb.wales.gov.uk" . $e->find("img",0)->src;

    // Set src to the new $url
    $e->find("img",0)->src = $url;

    // Print the outertext
    echo $e->outertext . '<br>';
}

echo " </body>";
echo "</html>";


// Clear dom object
$html->clear(); 
unset($html);

=&GT; Working Demo&lt; =