如何使用DomDocument从给定的html中获取href,Image src,title

时间:2017-09-16 06:34:11

标签: php html domdocument

给出Html -

  <div id="testid">
  <h1>Test Title</h1>
      <ul class="clearfix">
        <li class="anker" id="artists-A"></li>
        <li class="first">
            <a href="www.test1.html" title="Test1">
            <span>
            <img src="https://www.test1.de/img/test1.jpg" alt="Test1" />
            <span>Test1</span>
            </span>
            </a>
        </li>
        <li>
            <a href="www.test2.html" title="Test2">
            <span>
            <img src="https://www.test2.de/img/test2.jpg" alt="Test2" />
            <span>Test2</span>
            </span>
            </a>
        </li>
        <li class="first">
            <a href="www.test3.html" title="Test3">
            <span>
            <img src="https://www.test1.de/img/test3.jpg" alt="Test3" />
            <span>Test3</span>
            </span>
            </a>
        </li>
      </ul> 
</div>

需要获得href值,img src和span即标题。 我正在使用domDocument解析它,但没有得到确切的结果。

代码:

$doc = new DomDocument; 
$doc->validateOnParse = true; 
$doc->loadHtml(file_get_contents($url)); 
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//[@id="testid"]/ul/li');

2 个答案:

答案 0 :(得分:1)

我们在这里使用DOMDocument。目前我正在收集a的{​​{1}}和href img,您可以添加更多标签。

Try this code snippet here

src

答案 1 :(得分:0)

我建议您使用SimpleHtmlDom库。

<?php 

	require_once "SimpleHtmlDom.php";
	
	// put in file contentToParse.html your html code
	$htmlToParse = file_get_contents("contentToParse.html");
	
	$htmlObject = str_get_html($htmlToParse);
	
	$resultObject = array();
	
	
	foreach($htmlObject->find("#testid ul li a") as $singleLink)
	{
		var_dump($singleLink->href);
	}
        foreach($htmlObject->find("#testid ul li img") as $singleImage)
	{
		var_dump($singleImage->src);
	}
	foreach($htmlObject->find("#testid ul li span span") as $singleSpan)
	{
		var_dump($singleSpan->innertext);
	}
?>