Question

我正在编写一个脚本，它必须从URL中提取所有标记，但不仅仅是标记中的值，我的意思是所有标记代码都是这样的：

<a href="test">Text</a>

我在preg_match_all中找到了一些内容，但这只是从href，title等中提取值，而不是整个标记代码。我该怎么做？

Answer 1

你可以使用html解析器： A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!

Answer 2

使用Simplehtmldom库从网址

获取数据

// Include the library
include('simple_html_dom.php');

// Retrieve the DOM from a given URL
$html = file_get_html('http://davidwalsh.name/');

// Find all "A" tags and print their HREFs
foreach($html->find('a') as $e) 
    echo $e->href . '<br>';

// Retrieve all images and print their SRCs
foreach($html->find('img') as $e)
    echo $e->src . '<br>';

PHP - 从网址中提取标签

2 个答案: