如何计算标签的innertext中的字数,并根据带标签的计数字将其溢出

时间:2014-11-17 09:46:04

标签: php regex

我遇到如下问题:

$str="i am a <b>software</b> <span style=\"color:red;\">engineer.</span>  i work at a company.";  //here, total word 10 (according inner text)

我希望只获得带有标签的5个单词: 的输出:

$output="i am a <b>software</b> <span style=\"color:red;\">engineer.</span>";  // 5 word 

怎么可能?请帮帮我..谢谢。

我有单词计数器功能:

function word( $str, $wordCount = 10 ) {
        return implode( 
        '', 
        array_slice( 
        preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
        ),
        0,
        $wordCount*2-1
        )
        );
    }

3 个答案:

答案 0 :(得分:1)

以下是一个示例,但您必须对其进行调整以适应单词中允许的字符:

<?php
$input = 'i am a <b>software</b> <span style=\"color:red;\">engineer.</span>  i work at a company.';
$pattern = '#((?: \s* (<[^>]*>)* [a-z.-]+ (</[^>]*>)* ){0,5}).*#x';
$result = preg_replace($pattern, '$1', $input);
var_dump($result);

答案 1 :(得分:1)

更精确的解决方案

<?php
$input = 'i am a <b>software</b> <span style=\"color:red;\">engineer. And </span> i work at a company.';

var_dump(customParse($input, 5));
var_dump(customParse($input, 4));
var_dump(customParse($input, 3));

$input = 'i am a <b>software</b> <foo style=\"color:red;\">engineer. And </foo> i work at a company.';

var_dump(customParse($input, 5));

function customParse($input, $limit) {
    $pattern = '#(
    \s*
    (?: <(\w+) [^>]* >)*
    [a-z.-]+
    (</[^>]*>)*
    )#x';
    preg_match_all($pattern, $input, $matches);
    $result = '';
    for ($nbMatch = 0; $nbMatch < $limit; $nbMatch++) {
        $capturedText = $matches[0][$nbMatch];
        $openTag = $matches[2][$nbMatch];
        $closeTag = $matches[3][$nbMatch];

        $result .= $capturedText;

        if ($openTag && !$closeTag) {
            $result .= '</' . $openTag . '>';
        }
    }

    return $result;
}

答案 2 :(得分:0)

有可能。您可以像这样使用preg_match_all

<?php
$input = 'i am a <b>software</b> <span style=\"color:red;\">engineer. And </span> i work at a company.';
$pattern = '#(
\s*
(<[^>]*>)*
[a-z.-]+
(</[^>]*>)*
)#x';
preg_match_all($pattern, $input, $matches);
var_dump($matches);

然后,对于每个匹配,您测试$ matches [2] [index]是否为空并且$ matches [3] [index]为空以添加结束标记。 但我认为这不完整,容易出错。您可能需要修改它才能运行所有可能性。