我从网址获得了一个HTML。我想要实现的只是在div中获取纯文本内容。知道是否可以实现。 结构将与此类似
<div class="first">
<div class="second">
Some content inside second div
<div class="third">
Some more content inside third div
</div>
</div>
</div>
当我提取内容时,我想在数组中获取纯文本内容,如
Array(
[first]=>
[second]=>Some content inside second div
[third]=>Some more content inside third div
);
我正在尝试使用strip_tags实现这一点但不知何故我对将其拆分并将其添加到数组感到困惑。任何可能有任何想法的人都请帮忙。
答案 0 :(得分:1)
Array ( [0] => Some content inside second div [1] => Some more content inside third div )
这将输出:
class test(object):
self.CFTs = collections.namedtuple('CFTs', 'c4annual c4perren c3perren ntfixing')
self.CFTs.c4annual = numpy.zeros(shape=(self.yshape, self.xshape))
self.CFTs.c4perren = numpy.zeros(shape=(self.yshape, self.xshape))
self.CFTs.c3perren = numpy.zeros(shape=(self.yshape, self.xshape))
self.CFTs.ntfixing = numpy.zeros(shape=(self.yshape, self.xshape))
如果要从外部页面检索此信息,我强烈建议您使用DOMDocument和xpath来获取元素。