Question

我使用此功能在文本字符串中查找特定字符集并将其转换为html标记：

function ccfc($content)
{
    $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

    // $code_block =  preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);
    if(preg_match($reg_exUrl, $content, $url)) {

           // make the urls hyper links
           $content = preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);

    } else {

           // if no urls in the text just return the text
           $content = $content;

        }



    $code_block = preg_replace_callback(
          '/([\`]{3})(.*?)([\`]{3})/s',
          function($matches) {
              $matches[2] = htmlentities($matches[2]);
              return '<pre><code>'. $matches[2] .'</code></pre>';
          },
          $content);

      $bold = preg_replace_callback(
                  '/([\*]{2})(.*?)([\*]{2})/s',
                  function($matches) {
                      $matches[2] = htmlentities($matches[2]);
                      return '<b>'. $matches[2] .'</b>';
                  },
                  $code_block);

      $italic = preg_replace_callback(
                  '/([\*]{1})(.*?)([\*]{1})/s',
                  function($matches) {
                      $matches[2] = htmlentities($matches[2]);
                      return '<i>'. $matches[2] .'</i>';
                  },
                  $bold);


    return $italic;

}

此功能会找到http://www.google.com等网址并将其转换为链接

第二个会找到```code content```并将其转换为<pre><code> code content </code></pre> 第三个会找到**内容**并转换为 content  第四个会找到*内容*并将其转换为 content  但如果代码写在``````之外，它就会被执行。如何使剩余文本使用htmlentities（）？

Answer 1

在通过转换器函数运行文本后调用htmlentities，而不是在转换之前调用它：

function ccfc($content) {
    $content = htmlentities($content);

这不会影响标记中涉及的实体（*和`），您还可以将double_encode标记设置为false，以确保已编码的内容（例如，链接中的&个字符）不会被编码两次 - see the PHP manual for the settings：

$content = htmlentities($content, ENT_QUOTES, UTF-8, false);

此设置会将文字视为UTF-8，对所有引号进行编码，但不会对http://example.com?p=1&q=2之类的链接进行双重编码。

另一方面，您不需要使用preg_replace_callback替换;您可以在替换表达式中使用捕获的文本。以下是代码格式正则表达式的示例：

$code_block = preg_replace(
      '/`{3}(.*?)`{3}/s',
      "<pre><code>$1</code></pre>",
      $content);

正如我的评论中所述，和已被弃用;如果您使用它们来强调文字，则可以分别用和替换它们;如果标记仅用于表示，最好将文本括在元素中，并为其提供一个具有粗体或斜体格式的类。

以下是移动了htmlentities和preg_replace替换的完整代码：

function ccfc($content)
{   $content = htmlentities($content, ENT_QUOTES, NULL, false);

    echo $content . PHP_EOL;

    $reg_exUrl = "/((http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/?\S*)?)/";

    // $code_block =  preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);
    // make the urls hyperlinks
    $content = preg_replace($reg_exUrl, "<a href='$1'>$1</a>", $content);

    # replace ``` with code blocks
    $content = preg_replace(
        '/`{3}(.*?)`{3}/s',
        "<pre><code>$1</code></pre>",
        $content);

    # replace **text** with strong text
    $content = preg_replace(
            '/\*{2}([^\*].*?)\*{2}/s',
            "<strong>$1</strong>",
            $content);

    # replace *text* with em text
    $content = preg_replace(
              '/\*(.*?)\*/s',
             "<em>$1</em>",
              $content);

    return $content;
}

快速解释preg_replace如何工作：当你在正则表达式中使用括号时，你将这些括号内的问题捕获到特殊变量$ 1，$ 2，$ 3等。第一组的内容括号在$1中，第二组的内容在$2中，依此类推。例如，请使用此正则表达式：

/(\w+) and (\w+)/

并且输入字符串bread and butter，bread匹配第一组parens中的表达式，butter匹配第二组中的表达式; $1将被设置为bread和$2。当我们执行preg_replace时，这很有用，因为我们可以在替换字符串中使用$1和$2：

$str = preg_replace("/(\w+) and (\w+)/", "I love $2 on $1", "bread and butter");
echo $str;

输出：

I love butter on bread

匹配字符串中但未被捕获的任何内容都将消失 - 就像此示例中的and一样。

在代码中的替换中，需要保留分隔符（*和`）之间的文本，因此它会在括号中捕获;不需要分隔符本身，因此它们不在括号中。

正则表达式中其他字符的说明：

?，*，+，{2}：这些是量词 - 它们决定了前一个模式应该出现的次数。 ?表示0或1次; *是0次或更多次; +是一次或多次; {2}意味着两次; {500}意味着500次。
\w代表任意数字，字母或_
.匹配任何字符
.*?匹配任意长度的字符串，包括长度为0.
\**将匹配0个或更多*个字符;要匹配*，您必须将其转义（即\*），以便正则表达式引擎不会将其解释为量词

用于查找标记文本并将其替换为代码标记的功能

1 个答案: