我想查找所有A Href并获取其链接和完整内容

时间:2018-06-19 17:54:48

标签: php regex preg-match-all

我的问题是,我想从大型HTML代码获得此内容: 包含href标记的所有其他标记均不可见!

<a href="/admin/home" torero-icon="home">Home</a>
  

在这里,我想首先获得“ / admin / home”,然后获得整个标签“ 主页”

<a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung</a>
  

在这里,我要首先获取“#”,然后再获取整个标签“ 帐户注册“ < / p>

感谢您的帮助人员:)

3 个答案:

答案 0 :(得分:0)

我正在研究类似的东西:

$urls = preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $page, $urls);

这将获取所有URL,但是要获取所有href,您需要将正则表达式更改为一个以完善所需的内容。

然后,您可以使用foreach语句遍历结果:

foreach ($urls as $url){
    echo "url: " . $url;
}

答案 1 :(得分:0)

我找到了一些能完成技巧的事情:

 preg_match_all('<a href="(.*)" (.*)>',$text,$match);
  

导致:

Array
 (
  [0] => Array
    (
        [0] => a href="/redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819" torero-modified="link-leading-external">Google
        [1] => a href="/admin/home" torero-icon="home">Home
        [2] => a href="/admin/pages" torero-icon="pages">Seiten
        [3] => a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung
        [4] => a href="/admin/accounts/users" torero-icon="person">Benutzer
        [5] => a href="/admin/accounts/permissions" torero-icon="check">Rechte
        [6] => a href="#" torero-icon="add" torero-left-icon="trending_up">Statistiken
        [7] => a href="/admin/statistics/trending" torero-icon="timeline">Beliebte Beiträge
        [8] => a href="/admin/statistics/visibility" torero-icon="visibility">SEO Statistiken
        [9] => a href="/admin/layouts" torero-icon="view_quilt">Layouts
        [10] => a href="#" torero-icon="add" torero-left-icon="settings">Einstellungen
        [11] => a href="/admin/settings/profile" torero-icon="person_pin">Profil
        [12] => a href="/admin/settings/extensions" torero-icon="extension">Erweiterungen
        [13] => a href="/admin/settings/updates" torero-icon="refresh">Software Updates
        [14] => a href="/admin/settings/info" torero-icon="info">System Info
        [15] => a href="/admin/settings/report" torero-icon="bug_report">Fehler melden
        [16] => a href="/admin/settings/feedback" torero-icon="feedback">Feedback geben
        [17] => a href="/admin/logout" torero-icon="exit_to_app">Abmelden
    )

[1] => Array
    (
        [0] => /redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819
        [1] => /admin/home
        [2] => /admin/pages
        [3] => #" torero-icon="add
        [4] => /admin/accounts/users
        [5] => /admin/accounts/permissions
        [6] => #" torero-icon="add
        [7] => /admin/statistics/trending
        [8] => /admin/statistics/visibility
        [9] => /admin/layouts
        [10] => #" torero-icon="add
        [11] => /admin/settings/profile
        [12] => /admin/settings/extensions
        [13] => /admin/settings/updates
        [14] => /admin/settings/info
        [15] => /admin/settings/report
        [16] => /admin/settings/feedback
        [17] => /admin/logout
    )

[2] => Array
    (
        [0] => torero-modified="link-leading-external">Google
        [1] => torero-icon="home">Home
        [2] => torero-icon="pages">Seiten
        [3] => torero-left-icon="accessibility">Account Verwaltung
        [4] => torero-icon="person">Benutzer
        [5] => torero-icon="check">Rechte
        [6] => torero-left-icon="trending_up">Statistiken
        [7] => torero-icon="timeline">Beliebte Beiträge
        [8] => torero-icon="visibility">SEO Statistiken
        [9] => torero-icon="view_quilt">Layouts
        [10] => torero-left-icon="settings">Einstellungen
        [11] => torero-icon="person_pin">Profil
        [12] => torero-icon="extension">Erweiterungen
        [13] => torero-icon="refresh">Software Updates
        [14] => torero-icon="info">System Info
        [15] => torero-icon="bug_report">Fehler melden
        [16] => torero-icon="feedback">Feedback geben
        [17] => torero-icon="exit_to_app">Abmelden
    )

)

答案 2 :(得分:0)

如果这是一个简单的字符串,请使用strstrpreg_match_all。如果您有完整的HTML文档,请使用PHP的内置DOMDocument。考虑:

$page_html = "<!DOCTYPE html>\n<html>\n...</body>\n</html>";
$doc = \DOMDocument::loadHTML( $page_html );

$anchors = $doc->getElementsByTagName('a');
foreach ( $anchors as $a )
    echo "Anchor HREF: " . $a->getAttribute('href') . PHP_EOL;

如果没有适当的标记化,基于字符串的方法将丢失边缘情况。例如,您要如何处理注释掉的锚点?还是不完全遵循您期望的形式的锚呢? DOMDocument解析器应该完全捕获您想要的内容。