Question

我想在标签中获取数据，例如：

<b>
    <sup>1</sup>A, a
</b>
<b> ab </b>
<b><sup>2</sup>A</b>

我想从上面的标记中获取A,a，ab和A。

但有时在数据行中没有标记，如下所示：

<b>ab</b>

但我想获得数据ab

我尝试关注代码：

foreach($html->find('b') as $word) {
   $words = $word->innertext;
       echo $words.'<br>';}

但是当有标记时，内部标记中的文字也会打印出来。如何在标记内获取数据？谢谢

Answer 1

您可以使用clone()方法获取父元素内的文本，即并忽略或其中的任何其他元素。

$('b')
.clone()      //clone the element
.children()   //select all childrens
.remove()     //remove all the children
.end()        //return to the matched element
.text();      //get the text

Answer 2

尝试：

<?php
$html = "<b>
            <sup>1</sup>A, a
        </b>
        <b> ab </b>
        <b><sup>2</sup>A</b>";
//remove all html tags without <sup>
$html = strip_tags($html,"<sup>");
//remove <sup> tag with its content
$html = preg_replace('#\<sup>[{\w},\s\d"]+\</sup>#', "", $html);
//remove \t, \n and \r (tabs, newline etc)
$html = str_replace(array("\t","\n","\r"),"",$html);
//also you can remove space from string
$html = str_replace(" ","",$html);
echo $html;
?>

从标签html获取数据

2 个答案: