我的问题是,我想从大型HTML代码获得此内容: 包含href标记的所有其他标记均不可见!
<a href="/admin/home" torero-icon="home">Home</a>
在这里,我想首先获得“ / admin / home”,然后获得整个标签“ 主页”
<a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung</a>
在这里,我要首先获取“#”,然后再获取整个标签“ 帐户注册“ < / p>
感谢您的帮助人员:)
答案 0 :(得分:0)
我正在研究类似的东西:
$urls = preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $page, $urls);
这将获取所有URL,但是要获取所有href,您需要将正则表达式更改为一个以完善所需的内容。
然后,您可以使用foreach语句遍历结果:
foreach ($urls as $url){
echo "url: " . $url;
}
答案 1 :(得分:0)
我找到了一些能完成技巧的事情:
preg_match_all('<a href="(.*)" (.*)>',$text,$match);
导致:
Array
(
[0] => Array
(
[0] => a href="/redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819" torero-modified="link-leading-external">Google
[1] => a href="/admin/home" torero-icon="home">Home
[2] => a href="/admin/pages" torero-icon="pages">Seiten
[3] => a href="#" torero-icon="add" torero-left-icon="accessibility">Account Verwaltung
[4] => a href="/admin/accounts/users" torero-icon="person">Benutzer
[5] => a href="/admin/accounts/permissions" torero-icon="check">Rechte
[6] => a href="#" torero-icon="add" torero-left-icon="trending_up">Statistiken
[7] => a href="/admin/statistics/trending" torero-icon="timeline">Beliebte Beiträge
[8] => a href="/admin/statistics/visibility" torero-icon="visibility">SEO Statistiken
[9] => a href="/admin/layouts" torero-icon="view_quilt">Layouts
[10] => a href="#" torero-icon="add" torero-left-icon="settings">Einstellungen
[11] => a href="/admin/settings/profile" torero-icon="person_pin">Profil
[12] => a href="/admin/settings/extensions" torero-icon="extension">Erweiterungen
[13] => a href="/admin/settings/updates" torero-icon="refresh">Software Updates
[14] => a href="/admin/settings/info" torero-icon="info">System Info
[15] => a href="/admin/settings/report" torero-icon="bug_report">Fehler melden
[16] => a href="/admin/settings/feedback" torero-icon="feedback">Feedback geben
[17] => a href="/admin/logout" torero-icon="exit_to_app">Abmelden
)
[1] => Array
(
[0] => /redirect/torero::external/https[dd][s][s]www[d]google[d]de[s]/8CF0-6416-DAEF-8C2B-1819
[1] => /admin/home
[2] => /admin/pages
[3] => #" torero-icon="add
[4] => /admin/accounts/users
[5] => /admin/accounts/permissions
[6] => #" torero-icon="add
[7] => /admin/statistics/trending
[8] => /admin/statistics/visibility
[9] => /admin/layouts
[10] => #" torero-icon="add
[11] => /admin/settings/profile
[12] => /admin/settings/extensions
[13] => /admin/settings/updates
[14] => /admin/settings/info
[15] => /admin/settings/report
[16] => /admin/settings/feedback
[17] => /admin/logout
)
[2] => Array
(
[0] => torero-modified="link-leading-external">Google
[1] => torero-icon="home">Home
[2] => torero-icon="pages">Seiten
[3] => torero-left-icon="accessibility">Account Verwaltung
[4] => torero-icon="person">Benutzer
[5] => torero-icon="check">Rechte
[6] => torero-left-icon="trending_up">Statistiken
[7] => torero-icon="timeline">Beliebte Beiträge
[8] => torero-icon="visibility">SEO Statistiken
[9] => torero-icon="view_quilt">Layouts
[10] => torero-left-icon="settings">Einstellungen
[11] => torero-icon="person_pin">Profil
[12] => torero-icon="extension">Erweiterungen
[13] => torero-icon="refresh">Software Updates
[14] => torero-icon="info">System Info
[15] => torero-icon="bug_report">Fehler melden
[16] => torero-icon="feedback">Feedback geben
[17] => torero-icon="exit_to_app">Abmelden
)
)
答案 2 :(得分:0)
如果这是一个简单的字符串,请使用strstr
或preg_match_all
。如果您有完整的HTML文档,请使用PHP的内置DOMDocument。考虑:
$page_html = "<!DOCTYPE html>\n<html>\n...</body>\n</html>";
$doc = \DOMDocument::loadHTML( $page_html );
$anchors = $doc->getElementsByTagName('a');
foreach ( $anchors as $a )
echo "Anchor HREF: " . $a->getAttribute('href') . PHP_EOL;
如果没有适当的标记化,基于字符串的方法将丢失边缘情况。例如,您要如何处理注释掉的锚点?还是不完全遵循您期望的形式的锚呢? DOMDocument
解析器应该完全捕获您想要的内容。