我需要language tag定义的BCP 47的正则表达式。
我知道完整的BNF语法可以在http://www.rfc-editor.org/rfc/bcp/bcp47.txt获得,并且我可以用它来编写我自己的语法,但希望已经有一个。
答案 0 :(得分:16)
看起来像这样:
^((?<grandfathered>(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|
i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)|(art-lojban|
cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang))|((?<language>
([A-Za-z]{2,3}(-(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})
(-(?<script>[A-Za-z]{4}))?(-(?<region>[A-Za-z]{2}|[0-9]{3}))?(-(?<variant>[A-Za-z0-9]{5,8}
|[0-9][A-Za-z0-9]{3}))*(-(?<extension>[0-9A-WY-Za-wy-z](-[A-Za-z0-9]{2,8})+))*
(-(?<privateUse>x(-[A-Za-z0-9]{1,8})+))?)|(?<privateUse>x(-[A-Za-z0-9]{1,8})+))$
以下是生成它的代码(在C#中):
var regular = "(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang)";
var irregular = "(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)";
var grandfathered = "(?<grandfathered>" + irregular + "|" + regular + ")";
var privateUse = "(?<privateUse>x(-[A-Za-z0-9]{1,8})+)";
var singleton = "[0-9A-WY-Za-wy-z]";
var extension = "(?<extension>" + singleton + "(-[A-Za-z0-9]{2,8})+)";
var variant = "(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})";
var region = "(?<region>[A-Za-z]{2}|[0-9]{3})";
var script = "(?<script>[A-Za-z]{4})";
var extlang = "(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2})";
var language = "(?<language>([A-Za-z]{2,3}(-" + extlang + ")?)|[A-Za-z]{4}|[A-Za-z]{5,8})";
var langtag = "(" + language + "(-" + script + ")?" + "(-" + region + ")?" + "(-" + variant + ")*" + "(-" + extension + ")*" + "(-" + privateUse + ")?" + ")";
var languageTag = @"^(" + grandfathered + "|" + langtag + "|" + privateUse + ")$";
Console.WriteLine(languageTag);
我无法保证其正确性(我可能已经拼写错误),但它在附录A中的示例中工作正常。
根据您的环境,您可能需要删除指定的捕获组"?<...>"
。
答案 1 :(得分:3)
适用于PHP的优化版本。
/^(?<grandfathered>(?:en-GB-oed|i-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|pwn|t(?:a[oy]|su))|sgn-(?:BE-(?:FR|NL)|CH-DE))|(?:art-lojban|cel-gaulish|no-(?:bok|nyn)|zh-(?:guoyu|hakka|min(?:-nan)?|xiang)))|(?:(?<language>(?:[A-Za-z]{2,3}(?:-(?<extlang>[A-Za-z]{3}(?:-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})(?:-(?<script>[A-Za-z]{4}))?(?:-(?<region>[A-Za-z]{2}|[0-9]{3}))?(?:-(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3}))*(?:-(?<extension>[0-9A-WY-Za-wy-z](?:-[A-Za-z0-9]{2,8})+))*)(?:-(?<privateUse>x(?:-[A-Za-z0-9]{1,8})+))?$/Di
答案 2 :(得分:0)
JavaScript会监管重复的命名捕获组,因此您必须将?<privateUse>
的第二次使用更改为例如?<privateUse1>
。编译为:
/^((?<grandfathered>(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)|(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang))|((?<language>([A-Za-z]{2,3}(-(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})(-(?<script>[A-Za-z]{4}))?(-(?<region>[A-Za-z]{2}|[0-9]{3}))?(-(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3}))*(-(?<extension>[0-9A-WY-Za-wy-z](-[A-Za-z0-9]{2,8})+))*(-(?<privateUse>x(-[A-Za-z0-9]{1,8})+))?)|(?<privateUse1>x(-[A-Za-z0-9]{1,8})+))$/
这是一种构造方法:
let privateUseUsed = 0
const privateUse = () => "(?<privateUse" + (privateUseUsed++) + ">x(-[A-Za-z0-9]{1,8})+)"
const grandfathered = "(?<grandfathered>" +
/* irregular */ (
"en-GB-oed" +
"|" + "i-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|pwn|tao|tay|tsu)" +
"|" + "sgn-(?:BE-FR|BE-NL|CH-DE)"
) +
"|" + /* regular */ (
"art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang"
) +
")"
const langtag = "(" +
"(?<language>" + (
"([A-Za-z]{2,3}(-" +
"(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2})" +
")?)|[A-Za-z]{4,8})"
) +
"(-" + "(?<script>[A-Za-z]{4})" + ")?" +
"(-" + "(?<region>[A-Za-z]{2}|[0-9]{3})" + ")?" +
"(-" + "(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})" + ")*" +
"(-" + "(?<extension>" + (
/* singleton */ "[0-9A-WY-Za-wy-z]" +
"(-[A-Za-z0-9]{2,8})+)"
) +
")*" +
"(-" + privateUse() + ")?" +
")"
const languageTagReStr = "^(" + grandfathered + "|" + langtag + "|" + privateUse() + ")$";
编辑:结果是ff不支持命名的捕获组,因此您必须使用.replace(/\?<a-zA-Z>/g, '')
将其删除,否则请先将它们删除:
const grandfathered = "(" +
/* irregular */ "(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)" +
"|" +
/* regular */ "(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang)" +
")";
const langtag = "(" +
"(" + (
"([A-Za-z]{2,3}(-" +
"([A-Za-z]{3}(-[A-Za-z]{3}){0,2})" +
")?)|[A-Za-z]{4}|[A-Za-z]{5,8})"
) +
"(-" + "([A-Za-z]{4})" + ")?" +
"(-" + "([A-Za-z]{2}|[0-9]{3})" + ")?" +
"(-" + "([A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})" + ")*" +
"(-" + "(" + (
/* singleton */ "[0-9A-WY-Za-wy-z]" +
"(-[A-Za-z0-9]{2,8})+)"
) +
")*" +
"(-" + "(x(-[A-Za-z0-9]{1,8})+)" + ")?" +
")";
const languageTag = RegExp("^(" + grandfathered + "|" + langtag + "|" + "(x(-[A-Za-z0-9]{1,8})+)" + ")$");
使用languageTag.test('en-us')
答案 3 :(得分:0)
如果使用基于CLDR的功能集(例如PHP的intl
扩展名),则可以使用以下函数检查intl
数据库中是否存在语言环境:
<?php
function is_locale($locale=''){
// STANDARDISE INPUT
$locale=locale_canonicalize($locale);
// LOAD ARRAY WITH LOCALES
$locales=resourcebundle_locales(NULL);
// RETURN WHETHER FOUND
return (array_search($locale,$locales)!==F);
}
?>
加载和搜索数据大约需要半毫秒,因此不会对性能造成太大影响。
当然,它只会在所使用的PHP版本随附的CLDR版本的数据库中找到这些内容,但是会在以后的每个PHP版本中进行更新。