从页面上的每个嵌套div中剥离并获取文本内容

时间:2015-11-29 14:36:11

标签: php html-content-extraction

我从网址获得了一个HTML。我想要实现的只是在div中获取纯文本内容。知道是否可以实现。 结构将与此类似

<div class="first">
  <div class="second">
     Some content inside second div
    <div class="third">
      Some more content inside third div
    </div>
  </div>
</div>

当我提取内容时,我想在数组中获取纯文本内容,如

Array(
 [first]=>
 [second]=>Some content inside second div
 [third]=>Some more content inside third div
);

我正在尝试使用strip_tags实现这一点但不知何故我对将其拆分并将其添加到数组感到困惑。任何可能有任何想法的人都请帮忙。

1 个答案:

答案 0 :(得分:1)

Array ( [0] => Some content inside second div [1] => Some more content inside third div )

这将输出:

class test(object):
            self.CFTs = collections.namedtuple('CFTs', 'c4annual c4perren c3perren ntfixing')

            self.CFTs.c4annual = numpy.zeros(shape=(self.yshape, self.xshape))
            self.CFTs.c4perren = numpy.zeros(shape=(self.yshape, self.xshape))
            self.CFTs.c3perren = numpy.zeros(shape=(self.yshape, self.xshape))
            self.CFTs.ntfixing = numpy.zeros(shape=(self.yshape, self.xshape))

如果要从外部页面检索此信息,我强烈建议您使用DOMDocument和xpath来获取元素。