使用PHP提取文本并放入数组

时间:2010-06-23 18:24:39

标签: php

我有以下字符串,需要提取div中的文本(编辑器前言,更多内容等)并将它们放入带有php的数组中。我怎么能这样做?

提前致谢。

<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> 

4 个答案:

答案 0 :(得分:3)

使用Simple HTML DOM

$html = <<<HTML
<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> 
HTML;

$src = str_get_html($html); 
$elem = $src->find("div.classit a");

foreach ($elem as $link) {
    $links[] = $link->plaintext;
}

print_r($links);

答案 1 :(得分:1)

您可以使用PHP自己的DOM扩展

$string = '<div><a>Elem 1</a></div><div><a>Elem 2</a></div>...etc';

$dom = new DOMDocument();
$dom->loadHTML($string);

$elements = $dom->getElementsByTagName('a');

$textElements = array();
foreach($elements as $node) {
    textElements[] = $node->nodeValue;
}

如果要加载更大的HTML提取,可以使用DOMXPath查询DOMDocument,以便获得所需的元素。

$xPathObj = new DOMXPath($dom);
$elements = $xPathObj->query('//div[@class='classit']/a');

修改

DOMNodeList支持foreach,所以我将for($i = 0; $i < $elements->length; $i++) {$elements->item($i)->nodeValue;}更改为foreach($elements as $node) {$node->nodeValue}

答案 2 :(得分:0)

你可以使用strip_tags

$s = "<div class='classit'><a href='site.php?site=1&fn=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> ";

foreach (explode("\n", $s) as $val){
    $new[] = strip_tags($val);
}
var_dump($new);

答案 3 :(得分:0)

您可以使用preg_match_all

<?php
$html = <<<HTML
<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div>
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div>
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div>
HTML;

$result = array();

if (preg_match_all('/>([^><]+)(?=<\/a>)/', $html, $matches))
{
    $result = $matches[1];
}

print_r($result);