查找并提取包含标签的文本中重复出现的字符串

时间:2014-01-07 08:40:27

标签: php string substring

假设我有这段文字:

The <b>quick brown</b> fox jumps over the lazy dog
The quick brown fox jumps over the <b>lazy dog</b>
The quick brown fox <b>jumps over</b> the lazy dog

我想从上面的文字中获取并提取所有出现的字符串:

<b>quick brown</b>
<b>lazy dog</b>
<b>jumps over</b>

现在我知道我需要一个while循环来检查文本的结尾和一些字符串函数,但我不确定是哪些。

感谢有人可以提供帮助。

3 个答案:

答案 0 :(得分:0)

这样做..

<?php
$html='The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog';

function funcx($v)
{
    return "<b>".$v."</b>";
}

preg_match_all('~<b>(.*?)<\/b>~', $html, $matches);
$results=array_map('funcx',$matches[1]);
var_dump($results);

<强>输出:

array (size=3)
  0 => string '<b>quick brown</b>' (length=18)
  1 => string '<b>quick brown</b>' (length=18)
  2 => string '<b>quick brown</b>' (length=18)

答案 1 :(得分:0)

如果您想使用正则表达式,请尝试以下操作:

/<b ?.*>(.*)<\/b>/

它会捕获<b></b>标记内的所有内容,包括标记本身。

Online Example

您可以使用简单的函数将上面的正则表达式的用法扩展到多个<b>标记,并传递您想要捕获的标记:

示例:

function getTextBetweenTags($string, $tagname)
{
    $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
    preg_match_all($pattern, $string, $matches);
    return $matches;
}

用法:

$string = 'The <b>quick brown</b> fox jumps over the lazy dog \
           The <b>quick black</b> fox jumps over the lazy dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b");
print_r($text);

输出:

Array
(
    [0] => Array
        (
            [0] => <b>quick brown</b>
            [1] => <b>quick black</b>
            [2] => <b>quick white</b>
        )

)

Online Example

编辑1:

我已经为您扩展了上面的功能,因此它可以使用多个标签:

示例:

function getTextBetweenTags($string, $tagsname)
{
    $tagsname = explode(',',$tagsname);
    foreach ($tagsname as $tagname) 
    {
        $pattern = '/<'.$tagname.'>.*?<\/'.$tagname.'>/is';
        preg_match_all($pattern, $string, $matches);
        $results[] = $matches;
    }
    return $results;
}

用法:

$string = 'The <b>quick brown</b> fox jumps <strong>over</strong> the lazy dog \
           The <b>quick black</b> fox jumps over the <span>lazy</span> dog \
           The <b>quick white</b> fox jumps over the lazy dog';
$text = getTextBetweenTags($string, "b,strong,span"); // Single or multiple HTML tags
print_r($text);

输出:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => <b>quick brown</b>
                    [1] => <b>quick black</b>
                    [2] => <b>quick white</b>
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => <strong>over</strong>
                )

        )

    [2] => Array
        (
            [0] => Array
                (
                    [0] => <span>lazy</span>
                )

        )

)

Online Example

答案 2 :(得分:0)

$text = "The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog
The <b>quick brown</b> fox jumps over the lazy dog";
$part = "<b>quick brown</b>";
$count = substr_count($text, $part);
for($i=0;$i<$count;$i++)
{
echo $part."<br>";
}

<强>输出

快速褐色

快速褐色

快速褐色

如果您更换

echo $part."<br>";

echo htmlspecialchars($part)."<br>";

<强>输出

<b>quick brown</b>
<b>quick brown</b>
<b>quick brown</b>