我想通过PHP DomDocument解析HTML网页,并从其alt属性中提取img。像getElementById()
这样的函数。有办法吗?
答案 0 :(得分:1)
DOMDocument有一个名为getElementsByTagName的方法,可用于按标签名称获取元素。例如
// a date before DST starts
var beforeDstUtc = new DateTime(2018, 3, 1, 13, 0, 0, DateTimeKind.Utc);
// a date after DST starts
var afterDstUtc = new DateTime(2018, 4, 1, 13, 0, 0, DateTimeKind.Utc);
var cstZone = TimeZoneInfo.FindSystemTimeZoneById("Central Standard Time");
var cstTime1 = TimeZoneInfo.ConvertTimeFromUtc(beforeDstUtc, cstZone);
var cstTime2 = TimeZoneInfo.ConvertTimeFromUtc(afterDstUtc, cstZone);
var expectedBeforeDstLocal = new DateTime(2018, 3, 1, 07, 0, 0, DateTimeKind.Local);
var expectedAfterDstLocal = new DateTime(2018, 4, 1, 08, 0, 0, DateTimeKind.Local);
// should be -6 hours
Assert.AreEqual(expectedBeforeDstLocal, cstTime1);
// should be -5 hours
Assert.AreEqual(expectedAfterDstLocal, cstTime2);
这会给出
<?php
$htmlStr = <<<EOD
<!DOCTYPE html>
<html>
<head>
<title>Some nice page</title>
</head>
<body>
<h1>Something nice</h1>
<img id="beautiful-para" src="https://" alt="foo-hj" />
</body>
</html>
EOD;
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHTML($htmlStr);
$images = $doc->getElementsByTagName('img');
foreach ($images as $image) {
var_dump($image->getAttribute('alt'));
}
希望这有帮助。