Question

我要在this页上获取公司名称。我尝试过的：

<?PHP
$html = file_get_contents('https://www.goudengids.be/bedrijf/Willebroek/L11159413/CNC+Metal/');
$document = new DOMDocument;
$document ->loadHTML($html);
$xPath = new DOMXPath($document);
$anchorTags = $xPath->evaluate("//div[@class=\"title-logo\"]//h1");
foreach ((array)$anchorTags  as $anchorTag) {
    echo 'name : '.$anchorTag;
}
?>

我在另一个网站上做了类似的事情，它确实起作用了，但实际上$ anchorTags数组似乎是空的。问题出在哪里？谢谢。

Answer 1

您要查找的xpath是：

<Button.Style>
    <Style TargetType="{x:Type Button}">
        <Setter Property="Template">
            <Setter.Value>
                <ControlTemplate TargetType="{x:Type Button}">
                    <Ellipse Fill="{TemplateBinding Background}" Width="16" Height="16"/>
                </ControlTemplate>
            </Setter.Value>
        </Setter>
        <Style.Triggers>
            <Trigger Property="IsMouseOver" Value="True">
                <!--  Bind to custom color in ViewModel -->
                <Setter Property="Background" Value="{Binding CustomBrush}"/>
                <Setter Property="Cursor" Value="Hand"/>
            </Trigger>
        </Style.Triggers>
    </Style>
</Button.Style>

简单的//div[contains(@class,'title-logo')]//h1不会

Answer 2

您无需强制转换XPath evaluate()方法的结果以在foreach()中使用，还需要获取（我假设）nodeValue以获取实际的标头标签的内容...

foreach ($anchorTags  as $anchorTag) {
    echo 'name : '.$anchorTag->nodeValue;
}

将输出...

name : CNC Metal

Answer 3

这对我有用：

$html = file_get_contents('https://www.goudengids.be/bedrijf/Willebroek/L11159413/CNC+Metal/');
$document = new DOMDocument;
@$document->loadHTML($html); // using @ here to suppress a warning

$headings = $document->getElementsByTagName('h1');
foreach ($headings as $node) {
    echo 'name : '.$node->nodeValue;
}

无法从PHP网页获取h1

3 个答案: