我正在寻找一个好的JavaScript RegEx来将名称转换为正确的案例。例如:
John SMITH = John Smith
Mary O'SMITH = Mary O'Smith
E.t MCHYPHEN-SMITH = E.T McHyphen-Smith
John Middlename SMITH = John Middlename SMITH
嗯,你明白了。
有人提出全面的解决方案吗?
答案 0 :(得分:1)
Wimps!....这是我的第二次尝试。处理“John SMITH”,“Mary O'SMITH”,“John Middlename SMITH”,“E.t MCHYPHEN-SMITH”和“JoHn-JOE MacDoNAld”
Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)");
string newName = fixnames.Replace(badName, NameFixer);
static public string NameFixer(Match match)
{
string mc = "";
if (match.Groups[1].Captures.Count > 0)
{
if (match.Groups[1].Captures[0].Length == 3)
mc = "Mac";
else
mc = "Mc";
}
return
mc
+match.Groups[2].Captures[0].Value.ToUpper()
+match.Groups[3].Captures[0].Value.ToLower()
+match.Groups[4].Captures[0].Value;
}
注意:当我意识到你想要一个Javascript解决方案而不是一个.NET解决方案时,我有太多有趣的事情要停止....
答案 1 :(得分:1)
这样的东西?
function fix_name(name) {
var replacer = function (whole,prefix,word) {
ret = [];
if (prefix) {
ret.push(prefix.charAt(0).toUpperCase());
ret.push(prefix.substr(1).toLowerCase());
}
ret.push(word.charAt(0).toUpperCase());
ret.push(word.substr(1).toLowerCase());
return ret.join('');
}
var pattern = /\b(ma?c)?([a-z]+)/ig;
return name.replace(pattern, replacer);
}
答案 2 :(得分:1)
Mark Summerfield用Lingua::EN::NameCase完成了这项工作:
KEITH Keith
LEIGH-WILLIAMS Leigh-Williams
MCCARTHY McCarthy
O'CALLAGHAN O'Callaghan
ST. JOHN St. John
VON STREIT von Streit
VAN DYKE van Dyke
AP LLWYD DAFYDD ap Llwyd Dafydd
henry viii Henry VIII
louis xiv Louis XIV
上面是用Perl编写的,但它大量使用正则表达式,所以你应该能够收集一些好的技术。
以下是相关来源:
sub nc {
croak "Usage: nc [[\\]\$SCALAR]"
if scalar @_ > 1 or ( ref $_[0] and ref $_[0] ne 'SCALAR' ) ;
local( $_ ) = @_ if @_ ;
$_ = ${$_} if ref( $_ ) ; # Replace reference with value.
$_ = lc ; # Lowercase the lot.
s{ \b (\w) }{\u$1}gox ; # Uppercase first letter of every word.
s{ (\'\w) \b }{\L$1}gox ; # Lowercase 's.
# Name case Mcs and Macs - taken straight from NameParse.pm incl. comments.
# Exclude names with 1-2 letters after prefix like Mack, Macky, Mace
# Exclude names ending in a,c,i,o, or j are typically Polish or Italian
if ( /\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o ) {
s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ;
# Now correct for "Mac" exceptions
s/\bMacEvicius/Macevicius/go ; # Lithuanian
s/\bMacHado/Machado/go ; # Portuguese
s/\bMacHar/Machar/go ;
s/\bMacHin/Machin/go ;
s/\bMacHlin/Machlin/go ;
s/\bMacIas/Macias/go ;
s/\bMacIulis/Maciulis/go ;
s/\bMacKie/Mackie/go ;
s/\bMacKle/Mackle/go ;
s/\bMacKlin/Macklin/go ;
s/\bMacQuarie/Macquarie/go ;
s/\bMacOmber/Macomber/go ;
s/\bMacIn/Macin/go ;
s/\bMacKintosh/Mackintosh/go ;
s/\bMacKen/Macken/go ;
s/\bMacHen/Machen/go ;
s/\bMacisaac/MacIsaac/go ;
s/\bMacHiel/Machiel/go ;
s/\bMacIol/Maciol/go ;
s/\bMacKell/Mackell/go ;
s/\bMacKlem/Macklem/go ;
s/\bMacKrell/Mackrell/go ;
s/\bMacLin/Maclin/go ;
s/\bMacKey/Mackey/go ;
s/\bMacKley/Mackley/go ;
s/\bMacHell/Machell/go ;
s/\bMacHon/Machon/go ;
}
s/Macmurdo/MacMurdo/go ;
# Fixes for "son (daughter) of" etc. in various languages.
s{ \b Al(?=\s+\w) }{al}gox ; # al Arabic or forename Al.
s{ \b Ap \b }{ap}gox ; # ap Welsh.
s{ \b Ben(?=\s+\w) }{ben}gox ; # ben Hebrew or forename Ben.
s{ \b Dell([ae])\b }{dell$1}gox ; # della and delle Italian.
s{ \b D([aeiu]) \b }{d$1}gox ; # da, de, di Italian; du French.
s{ \b De([lr]) \b }{de$1}gox ; # del Italian; der Dutch/Flemish.
s{ \b El \b }{el}gox unless $SPANISH ; # el Greek or El Spanish.
s{ \b La \b }{la}gox unless $SPANISH ; # la French or La Spanish.
s{ \b L([eo]) \b }{l$1}gox ; # lo Italian; le French.
s{ \b Van(?=\s+\w) }{van}gox ; # van German or forename Van.
s{ \b Von \b }{von}gox ; # von Dutch/Flemish
# Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX
s{ \b ( (?: [Xx]{1,3} | [Xx][Ll] | [Ll][Xx]{0,3} )?
(?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3} )? ) \b }{\U$1}gox ;
$_ ;
}
答案 3 :(得分:0)
不幸的是,有太多不同的名称格式可以正确执行此操作。 John-Joe MacDonald永远是个讨厌的人!
答案 4 :(得分:0)
同意它永远不会是完美的,但希望得到最常见的案例。这几乎是骆驼案件的任何“字”和处理连字符和撇号的我想作为空格。