Question

在我看来，'strlen'函数应该只返回字符串中的字符数。没有其他的。确实如此，无论是ASCII字符还是Unicode字符。字符是一个字符，指向ASCII表或UTF-8表上的给定位置。没什么。

如果您想知道，无论出于何种原因，字符串的字节长度，那么您应该使用differtent函数。我是PHP脚本的新手，所以我还没有找到这个功能。（应该是'bytelen（）'？）

Answer 1

mb_strlen()做你想要的。

Answer 2

是的，那将是最符合逻辑的设计。但是，PHP尚未计划从一开始就支持多字节字符集。相反，它在这些年里以一种混乱的方式发展。你已经将你的问题标记为PHP 4，但PHP 5还没有一个像样的Unicode支持（我不认为它会在不久的将来发生变化）。

无论如何，这有几个原因：

PHP不是由企业规则控制的集中式设计公司所拥有的闭源商业产品。
PHP于1995年作为个人项目发布，他的静态主页需要一些功能：当时，它不需要Unicode支持。
如果修改strlen（）等核心函数，则必须以不破坏以前功能的方式执行此操作。这是不容易的。编写新的单独函数要容易得多。

更新

抱歉，我忘记了你问题的第二部分。如果需要处理Unicode字符串，则必须使用一组单独的函数：

http://es.php.net/manual/en/book.mbstring.php

您可能还会发现这些章节很有趣：

请注意您计划使用的每个功能所需的PHP版本; PHP 4已经很老了。

Answer 3

如果我没有误解你，那么 strlen() 你的' bytelen（）'，如前所述在这里的其他回复中。

strlen（）本身不支持utf-8或其他多字节字符集;如果你想要一个正确的 strlen（），你需要 mb_strlen() 。

Pentium10的功能strBytes（$ str），从浏览它（不测试）看起来如果你知道你的编码是utf-8并且你遇到了一个很好的选择PHP4的超低版本出于某种原因。

（我建议看一下ÁlvaroG。Vicario的帖子，了解这种行为背后的原因。正确的原生UTF-8支持将来自PHP6。）

Answer 4

/** 
     * Count the number of bytes of a given string. 
     * Input string is expected to be ASCII or UTF-8 encoded. 
     * Warning: the function doesn't return the number of chars 
     * in the string, but the number of bytes. 
     * 
     * @param string $str The string to compute number of bytes 
     * 
     * @return The length in bytes of the given string. 
     */ 
    function strBytes($str) 
    { 
      // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT 

      // Number of characters in string 
      $strlen_var = strlen($str); 

      // string bytes counter 
      $d = 0; 

     /* 
      * Iterate over every character in the string, 
      * escaping with a slash or encoding to UTF-8 where necessary 
      */ 
      for ($c = 0; $c < $strlen_var; ++$c) { 

          $ord_var_c = ord($str{$d}); 

          switch (true) { 
              case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)): 
                  // characters U-00000000 - U-0000007F (same as ASCII) 
                  $d++; 
                  break; 

              case (($ord_var_c & 0xE0) == 0xC0): 
                  // characters U-00000080 - U-000007FF, mask 110XXXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=2; 
                  break; 

              case (($ord_var_c & 0xF0) == 0xE0): 
                  // characters U-00000800 - U-0000FFFF, mask 1110XXXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=3; 
                  break; 

              case (($ord_var_c & 0xF8) == 0xF0): 
                  // characters U-00010000 - U-001FFFFF, mask 11110XXX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=4; 
                  break; 

              case (($ord_var_c & 0xFC) == 0xF8): 
                  // characters U-00200000 - U-03FFFFFF, mask 111110XX 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=5; 
                  break; 

              case (($ord_var_c & 0xFE) == 0xFC): 
                  // characters U-04000000 - U-7FFFFFFF, mask 1111110X 
                  // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 
                  $d+=6; 
                  break; 
              default: 
                $d++;    
          } 
      } 

      return $d; 
    }

应该有像'bytelen'（以及'strlen'）这样的东西吗？

4 个答案:

更新