Question

可能重复：
How to parse and process HTML with PHP?

我正在使用不同产品的演示文稿构建我的网站，并且我在使用curl时面临一些问题基本上我需要做的是从不同的网站获取部分html并显示在我的网站上：标题，型号，描述，用户评论等.... 我设法完成了一些代码但是当更改源url停止工作时...甚至源也是一样的我的代码：

$url = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=2819129&CatId=4938";

//$url = "http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1808177&csid=_61"; //this one is not working....

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);

$source = curl_exec ($ch);

$start_description1 = "</tr>
</tbody>
</table>




<p>";
$end_description1 = "</div>
</div>
<div id=\"Videos\" style=\"display:inline;\">";
$description1_start_pos = strpos($source, $start_description1) + strlen($start_description1);
$description1_end_pos = strpos($source, $end_description1) - $description1_start_pos;
$description1 = substr($source, $description1_start_pos, $description1_end_pos);
echo $description1;

它工作得很完美，但是如果我更改了网址则无法正常工作...... 问题是start_description html代码...... 在其他页面上，html代码不同......

而不是：

</tr>
</tbody>
</table>




<p>

新页面有：

</tr>
</tbody>
</table>


<p>

或：

</tr>
</tbody>
</table>

<p>

我该如何避免这个错误？或怎么做以避免cUrl错误，并检索我想要的内容？

谢谢你！

Answer 1

您应该解析html并从html中获取描述，而不是使用strpos。

对于此应用程序，我建议使用PHP Simple HTML DOM Parser。

以下是其工作原理的示例：

$html = file_get_html('http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=1808177&csid=_61');
//fetches html content from the url
$p = $html->find('p', 0);
//fetches the content of the first <p> element.

echo $p-> plaintext;

希望这有帮助。

使用cUrl获取html源代码的特定部分 - 检索正确内容的cUrl问题，curl_setopt（$ ch，CURLOPT_URL，$ url）;

1 个答案: