我正在尝试插入从网页中提取但未插入到db中的文本。我使用xpath表达式来提取数据,网页上的数据在多个html段或列表项标签内。
这是代码
<?php
set_time_limit(0);
$dbhost = "localhost";
$dbuser = "root";
$dbpass = "";
$dbname = "olx";
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ("Error connecting to database");
mysql_select_db($dbname, $conn);
$res1 = mysql_query("SELECT * FROM `item_url` WHERE id=10");
while($r1 = mysql_fetch_array($res1))
{
$url = $r1['url'];
$html = file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$details = $xpath->evaluate("//div[@id='description-text']/child::div");
foreach ($details as $detail) {
$nodes = $detail->childNodes;
foreach ($nodes as $node) {
$string = $node->nodeValue;
$string = preg_replace('/[^a-zA-Z0-9@.\-]/', ' ', $string); //allow required character
$string = strip_tags($string); //remove html tags
echo $string . '<br>';
}
}
mysql_query("INSERT INTO `test` (`detail`) VALUES ('$string')") or die(mysql_error());
}
?>
以这种方式显示数据
Performs skilled technical work in the maintenance, repair, replacement, and installation of air conditioning systems.
Installs, troubleshoots and repairs air conditioning units.
Replaces expansion valves, compressors, motors, coil units and other component parts.
Technicians work in residential homes, schools, hospitals, office buildings, or factories.
无法将此数据插入db.Is这是xpath nodes.each行的问题在网页上的
标记内。
下面是网页的html
<div id="description-text">
<h2 class="title-desc">
<span>Ad details</span>
</h2>
<ul class="item-optionals">
<li style="background-color: rgb(251, 251, 251);">
</ul>
<div style="padding-right: 30px; width: 388px;">
<p> Performs skilled technical work in the maintenance, repair, replacement, and installation of air conditioning systems.</p>
<p>Installs, troubleshoots and repairs air conditioning units.</p>
<p>Replaces expansion valves, compressors, motors, coil units and other component parts.</p>
<p>Technicians work in residential homes, schools, hospitals, office buildings, or factories.</p>
</div>
</div>
答案 0 :(得分:0)
您的代码很好,唯一的问题是您在循环结束时调用了mysql_query来处理html页面的单个节点,为了解决这个问题,它足以启动mysql_query
调用在最内部的foreach循环中。
<?php
set_time_limit(0);
$dbhost = "localhost";
$dbuser = "root";
$dbpass = "";
$dbname = "olx";
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ("Error connecting to database");
mysql_select_db($dbname, $conn);
$res1 = mysql_query("SELECT * FROM `item_url` WHERE id=10");
while($r1 = mysql_fetch_array($res1))
{
$url = $r1['url'];
$html = file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$details = $xpath->evaluate("//div[@id='description-text']/child::div");
foreach ($details as $detail) {
$nodes = $detail->childNodes;
foreach ($nodes as $node) {
$string = $node->nodeValue;
$string = preg_replace('/[^a-zA-Z0-9@.\-]/', ' ', $string); //allow required character
$string = strip_tags($string); //remove html tags
echo $string . '<br>';
mysql_query("INSERT INTO 'test' ('detail') VALUES ('$string')") or die(mysql_error());
}
}
}
?>