Question

我使用php获取html文件的一部分：

HTML文件：

// Setup proxy
Proxy proxy = new Proxy(                                      //
        Proxy.Type.HTTP,                                      //
        InetSocketAddress.createUnresolved("127.0.0.1", 8080) //
);

// Fetch url with proxy
Document doc = Jsoup //
               .connect("http://www.example.com/") //
               .proxy(proxy) //
               .userAgent("Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2") //
               .header("Content-Language", "en-US") //
               .get();

PHP文件：

<div class="titles">
    <h2><a href="#">First Title</a></h2>
</div>

如何单独获取href和＆＃39; a＆＃39;的值？标记

因为我想将标题和链接保存到数据库中，

我需要＆＃39;＃＆＃39;和第一个标题＆＃39;不是＆＃39; a＆＃39;标签

Answer 1

$link应为简单HTML元素对象，您可以使用$link->href和文本内容$link->plaintext访问属性。请参阅http://simplehtmldom.sourceforge.net/manual.htm。

Answer 2

你可以使用DOMDocument和DOMXpath对象（＆gt; = php5）

参考：http://php.net/manual/en/class.domdocument.php

示例代码的一部分：

$html = '<div class="titles">
<h2><a href="#">First Title</a></h2>
</div>';


$page = new DOMDocument();
$page->loadHtml($html);

$xpath = new DOMXpath($page);
$a = $xpath->query("//a");

for ($i=0; $i < $a->length; $i++) {
    $_a = $a->item($i);

    echo  $_a->getAttribute("href");
    echo "<br>";
    echo  $_a->textContent;


}

php - 获取链接的价值

2 个答案: