如何在PHP中将字符串截断为前n个单词

时间:2014-07-03 19:34:31

标签: php html truncate

我想截断一个非常长的字符串,通过html元素格式化。

我需要前500个单词(不知怎的,我必须避免使用html标签<p><br>而我的函数截断字符串),但结果我必须保留/使用这些html元素,因为结果也应该用html标签格式化,如“原始整体”文本。

截断字符串的最佳方法是什么?

示例:

原文

> <p><a href="/t/the-huffington-post">The Huffington Post</a> (via <a
> href="/t/daily-mail">Daily Mail</a>) is reporting that <a
> href="/t/misty">Misty</a> has been returned to a high kill shelter for
> farting too much! She appeared on Greenville County Pet Rescue’s
> “urgent” list, which means if she doesn’t get readopted, she will be
> euthanized!</p>

我需要前n个单词(n = 10)

>  <p><a href="/t/the-huffington-post">The Huffington Post</a> (via <a
> href="/t/daily-mail">Daily Mail</a>) is reporting that.. </p>

2 个答案:

答案 0 :(得分:1)

蛮力方法是将所有元素拆分为空白,然后迭代它们。尽管如此,您仍然只输出非标记元素,但最多只计算非标记元素。这些方面的东西:

$string = "your string here";
$output = "";
$count = 0;
$max = 10;
$tokens = preg_split('/ /', $string);
foreach ($tokens as $token)
{
  if (preg_match('/<.*?>/', $token)) {
    $output .= "$token ";
  } else if ($count < $max) {
    $output .= "$token ";
    $count += 1;
  }
}
print $output;

答案 1 :(得分:1)

你可以通过谷歌搜索获得found something like this

  // Original PHP code by Chirp Internet: www.chirp.com.au
  // Please acknowledge use of this code by including this header.

  function restoreTags($input)
  {
    $opened = array();

    // loop through opened and closed tags in order
    if(preg_match_all("/<(\/?[a-z]+)>?/i", $input, $matches)) {
      foreach($matches[1] as $tag) {
        if(preg_match("/^[a-z]+$/i", $tag, $regs)) {
          // a tag has been opened
          if(strtolower($regs[0]) != 'br') $opened[] = $regs[0];
        } elseif(preg_match("/^\/([a-z]+)$/i", $tag, $regs)) {
          // a tag has been closed
          unset($opened[array_pop(array_keys($opened, $regs[1]))]);
        }
      }
    }

    // close tags that are still open
    if($opened) {
      $tagstoclose = array_reverse($opened);
      foreach($tagstoclose as $tag) $input .= "</$tag>";
    }

    return $input;
  }

将它与文章中提到的另一个功能结合使用时:

  function truncateWords($input, $numwords, $padding="")
  {
    $output = strtok($input, " \n");
    while(--$numwords > 0) $output .= " " . strtok(" \n");
    if($output != $input) $output .= $padding;
    return $output;
  }

然后,你可以通过这样做来实现你正在寻找的东西:

$originalText = '...'; // some original text in HTML format
$output = truncateWords($originalText, 500); // This truncates to 500 words (ish...)
$output = restoreTags($output); // This fixes any open tags