Question

目前，我正在使用与ucwords相关的函数在连字符，点和撇号之后制作大写字母：

function ucwordsMore ($str){
    $str = ucwords($str);
    $str = str_replace('- ','-',ucwords(str_replace('-','- ',$str)));  // hyphens
    $str = str_replace('. ','.',ucwords(str_replace('.','. ',$str)));  // dots
    $str = preg_replace("/\w[\w']*/e", "ucwords('\\0')", $str);        // apostrophes

    return $str;
}

英文字母很好用。但是，非英文字母无法正确识别。例如，这个文字：

Ladernièreusinefrançaised'accémbéonsresteàTulle

变成了这个文字：

LaDernièReUsineFrançAiseD'acordéOtResteàTulle

但我需要它：

LaDernièreUsineFrançaiseD'AccordéonsResteÀTulle

有什么想法吗？

Answer 1

您可能需要在LC_CTYPE之前使用setlocale才能正确完成此类转换，但也存在字符串编码的问题。ucwords仅用于处理单字节编码的文本。

Answer 2

正如@Jon所提到的，你需要use locale来实现影响使用它的函数调用的大/小写之间的关系。通常是LC_CTYPE。

还有数字行为，排序，货币等常量。 Locale需要安装在您的机器上，或通过插件或模块等提供。阅读。

我根本不知道php语言环境，所以这里是Perl中使用与你不同的正则表达式方法的示例。我无法弄清楚你的解决方案，希望你能从我的想法中得到一些想法。

use locale;
use POSIX qw(locale_h);

setlocale(LC_CTYPE, "en_US");

$str = "La dernière usine française d'accordéons reste à Tulle";

$str =~ s/ (?:^|(?<=\s)|(?<=\w-)|(?<=\w\.)|(?<=\w\')) (\w) / uc($1) /xeg;

print "$str\n";

输出

La Dernière Usine Française D'Accordéons Reste À Tulle

正则表达式

Form is s///  find and replace

s/                  # Search

  (?:                  # Group
      ^                   # beginning of string
    | (?<=\s)             # or, lookbehind \s
    | (?<=\w-)            # or, lookbehind \w-
    | (?<=\w\.)           # or, lookbehind \w\.
    | (?<=\w\')           # or, lookbehind \w\'
  )                    # End group
  (\w)                 # Capture group 1, a single word char

/                   # Replace
  uc($1)               # Upercased word char from capt grp 1

/xeg;               # Modifiers x(expanded), e(eval), g(global)

Answer 3

查看Kohana UTF8课程 - http://kohanaframework.org/3.2/guide/api/UTF8

Answer 4

使用此：

function mb_ucwords ($string)
{
    return mb_convert_case ($string, MB_CASE_TITLE, 'UTF-8'); 
}

将ucwords用于非英文字符

4 个答案: