Question

这是网站上的html文字，我想抓住

在您死前看到的1,000个地方

<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>

我使用了像这样的代码

foreach($html->find('ul.listings li a') as $e)
echo $e->innertext. '<br/>';

我得到的输出就像

 999: Whats Your Emergency<span class="epnum">2012</span>

包括跨度请帮我这个

Answer 1

为什么不DOMDocument并获取title属性？：

$string = '<ul class="listings">
<li>
<a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
1,000 Places To See Before You Die
<span class="epnum">2009</span>
</a>
</li>';

$dom = new DOMDocument;
$dom->loadHTML($string);
$xpath = new DOMXPath($dom);
$text = $xpath->query('//ul[@class="listings"]/li/a/@title')->item(0)->nodeValue;
echo $text;

或

$text = explode("\n", trim($xpath->query('//ul[@class="listings"]/li/a')->item(0)->nodeValue));
echo $text[0];

Codepad Example

Answer 2

我有两种方法可以解决这个问题。一，是你从锚标签中获取title属性。当然，不是每个人都为锚标记设置了title属性，如果他们想要以这种方式填充它，那么属性的值可能会有所不同。另一个解决方案是，获取innertext属性，然后用空值替换anchor标记的每个子项。

所以，要么这样做

$e->title;

或者

$text = $e->innertext;
foreach ($e->children() as $child)
{
    $text = str_replace($child, '', $text);
}

尽管如此，使用DOMDocument代替此可能是个好主意。

Answer 3

您可以strip_tags()使用

echo trim(strip_tags($e->innertext));

或尝试使用preg_replace()删除不需要的标记及其内容

echo preg_replace('/<span[^>]*>([\s\S]*?)<\/span[^>]*>/', '', $e->innertext);

Answer 4

首先检查你的HTML。现在就像

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>';

没有关闭ul的标签，也许你错过了它。

  $string = '<ul class="listings">
               <li>
                  <a href="http://watchseries.eu/serie/1,000_places_to_see_before_you_die" title="1,000 Places To See Before You Die">
 1,000 Places To See Before You Die
                    <span class="epnum">2009</span>
                 </a>
             </li>
            </ul>';

试试这个

 $xml = simplexml_load_string($string);
 echo $xml->li->a['title'];

Answer 5

改为使用plaintext。

echo $e->plaintext;

但是仍然可以存在使用正则表达式修剪的年份。

文档here中的示例：

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

在php中使用curl概念获取内部文本

5 个答案: