Question

我希望在php 中按类“page1”逐页获取所有链接。 jquery中的相同代码

$("a#page1").echo(function()
{
});

可以在php中做到吗？

$pattern = '`.*?((http|ftp)://[\w#$&+,\/:;=?@%.-]+)[^\w#$&+,\/:;=?@%.-]*?`i';
preg_match_all($pattern,$page_g,$matches);

此代码获取$ page_g中的所有href，但它不适用于class =“page1”。 我只希望$ page_g中的所有href按class =“page1” 可以帮助我优化reqular ex或其他方式？例如

$page_g="<a href="/?s=cache:16001429:office+s01e02" title="" class="big">the <strong>office</strong> us s01 05 xvid mu</a> <a href="asd.com" class="a">asd</a>";

我想要仅返回/？s = cache：16001429：office + s01e02 TNX

Answer 1

你缺乏使用正则表达式的专业知识。因此，为什么使用DOMdocument是可行的解决方案。如果您想拥有更简单的API，请使用jQuery-lookalikes phpQuery或 QueryPath ：

$link = qp($html)->find("a#page1")->attr("href");
print $link;

Answer 2

修改自您澄清问题后编辑。

获取课程<a>的所有.page1个链接：

// Load the HTML from a file
$your_HTML_string = file_get_contents("html_filename.html");

$doc = new DOMDocument();
$doc->loadHTML($your_HTML_string);

// Then select all <a> tags under #page1
$a_links = $doc->getElementsByTagName("a");

foreach ($a_links as $link) {
  // If they have more than one class, 
  // you'll need to use (strpos($link->getAttribute("class"), "page1") >=0)
  // instead of == "page1"

  if ($link->getAttribute("class") == "page1") {
    // do something
  }
}

Answer 3

使用DomDocument解析HTML页面，这是一个教程：

Tutorial

Answer 4

在这里使用DOM是首选，因为如果基础HTML发生变化，正则表达式难以维护，此外，DOM可以处理无效的HTML并允许您访问其他与HTML解析相关的工具。

因此，假设有一个包含HTML的文件，并且您正在搜索类，那么这可能是要走的路：

$doc = new DOMDocument;
$doc->load(PATH_TO_YOUR_FILE);
//we will use Xpath to find all a containing your class, as a tag can have more than one class and it's just easier to do it with Xpath. 
$xpath = new DOMXpath($doc);
$list = $xpath->query("//a[contains(@class, 'page1')]"); 
foreach ($list as $a_tag) {
    $href = $a_tag->getAttribute('href');
    //do something
}

在php中选择标签并获取href

4 个答案: