您可以在\ p {Language}中包含不受支持的脚本吗?

时间:2019-03-14 23:15:50

标签: regex

使用Drupal 8.x,PHP 7

使用某些正则表达式替换,我发现\p{Language},例如\p{Armenian}

我喜欢这种方法,因为它是可以辩护的,我可以说“这支持官方脚本,仅此而已...”,而且看起来足够灵活。但是... \p{English}作为值会返回错误。

为解决这个问题,我尝试了以下代码:

// Get current site language.
$language =  \Drupal::languageManager()->getCurrentLanguage()->getName();

// Filter by Language.
if ($language == 'English') {
  // Search for any word that starts with '@' in English.
  $pattern = '/@(\w+)/';
}
else {
  // Search for any word that starts with '@' by language.
  $pattern = '~@(\p{' . $language . '}+)~u';
}

// Execute replacement.
$replaceText = preg_replace($pattern, $replacement, $text);

是否可以扩展\p{}或合并其他语言?

1 个答案:

答案 0 :(得分:1)

在CLDR中,可以使用Unicode v11脚本:

\p{Adlam}
\p{Afak}
\p{Ahom}
\p{Anatolian_Hieroglyphs}
\p{Arabic}
\p{Armenian}
\p{Avestan}
\p{Balinese}
\p{Bamum}
\p{Bassa_Vah}
\p{Batak}
\p{Bengali}
\p{Bhaiksuki}
\p{Blis}
\p{Bopomofo}
\p{Brahmi}
\p{Braille}
\p{Buginese}
\p{Buhid}
\p{Canadian_Aboriginal}
\p{Carian}
\p{Caucasian_Albanian}
\p{Chakma}
\p{Cham}
\p{Cherokee}
\p{Cirt}
\p{Common}
\p{Coptic}
\p{Cuneiform}
\p{Cypriot}
\p{Cyrillic}
\p{Cyrs}
\p{Deseret}
\p{Devanagari}
\p{Dogra}
\p{Duployan}
\p{Egyd}
\p{Egyh}
\p{Egyptian_Hieroglyphs}
\p{Elbasan}
\p{Ethiopic}
\p{Geok}
\p{Georgian}
\p{Glagolitic}
\p{Gothic}
\p{Grantha}
\p{Greek}
\p{Gujarati}
\p{Gunjala_Gondi}
\p{Gurmukhi}
\p{Han}
\p{Hanb}
\p{Hangul}
\p{Hanifi_Rohingya}
\p{Hans}
\p{Hant}
\p{Hanunoo}
\p{Hatran}
\p{Hebrew}
\p{Hiragana}
\p{Imperial_Aramaic}
\p{Inds}
\p{Inherited}
\p{Inscriptional_Pahlavi}
\p{Inscriptional_Parthian}
\p{Jamo}
\p{Javanese}
\p{Jpan}
\p{Jurc}
\p{Kaithi}
\p{Kannada}
\p{Katakana}
\p{Katakana_Or_Hiragana}
\p{Kayah_Li}
\p{Kharoshthi}
\p{Khmer}
\p{Khojki}
\p{Khudawadi}
\p{Kore}
\p{Kpel}
\p{Lao}
\p{Latf}
\p{Latg}
\p{Latin}
\p{Lepcha}
\p{Limbu}
\p{Linear_A}
\p{Linear_B}
\p{Lisu}
\p{Loma}
\p{Lycian}
\p{Lydian}
\p{Mahajani}
\p{Makasar}
\p{Malayalam}
\p{Mandaic}
\p{Manichaean}
\p{Marchen}
\p{Masaram_Gondi}
\p{Maya}
\p{Medefaidrin}
\p{Meetei_Mayek}
\p{Mende_Kikakui}
\p{Meroitic_Cursive}
\p{Meroitic_Hieroglyphs}
\p{Miao}
\p{Modi}
\p{Mongolian}
\p{Moon}
\p{Mro}
\p{Multani}
\p{Myanmar}
\p{Nabataean}
\p{New_Tai_Lue}
\p{Newa}
\p{Nkgb}
\p{Nko}
\p{Nushu}
\p{Ogham}
\p{Ol_Chiki}
\p{Old_Hungarian}
\p{Old_Italic}
\p{Old_North_Arabian}
\p{Old_Permic}
\p{Old_Persian}
\p{Old_Sogdian}
\p{Old_South_Arabian}
\p{Old_Turkic}
\p{Oriya}
\p{Osage}
\p{Osmanya}
\p{Pahawh_Hmong}
\p{Palmyrene}
\p{Pau_Cin_Hau}
\p{Phags_Pa}
\p{Phlv}
\p{Phoenician}
\p{Psalter_Pahlavi}
\p{Rejang}
\p{Roro}
\p{Runic}
\p{Samaritan}
\p{Sara}
\p{Saurashtra}
\p{Sharada}
\p{Shavian}
\p{Siddham}
\p{SignWriting}
\p{Sinhala}
\p{Sogdian}
\p{Sora_Sompeng}
\p{Soyombo}
\p{Sundanese}
\p{Syloti_Nagri}
\p{Syre}
\p{Syriac}
\p{Syrj}
\p{Syrn}
\p{Tagalog}
\p{Tagbanwa}
\p{Tai_Le}
\p{Tai_Tham}
\p{Tai_Viet}
\p{Takri}
\p{Tamil}
\p{Tangut}
\p{Telugu}
\p{Teng}
\p{Thaana}
\p{Thai}
\p{Tibetan}
\p{Tifinagh}
\p{Tirhuta}
\p{Ugaritic}
\p{Unknown}
\p{Vai}
\p{Visp}
\p{Warang_Citi}
\p{Wole}
\p{Yi}
\p{Zanabazar_Square}
\p{Zmth}
\p{Zsye}
\p{Zsym}
\p{Zxxx}

和脚本扩展名:

\p{Script_Extensions=Adlam}
\p{Script_Extensions=Ahom}
\p{Script_Extensions=Anatolian_Hieroglyphs}
\p{Script_Extensions=Arabic}
\p{Script_Extensions=Arabic Coptic}
\p{Script_Extensions=Arabic Hanifi_Rohingya}
\p{Script_Extensions=Arabic Syriac}
\p{Script_Extensions=Arabic Syriac Mandaic Manichaean Psalter_Pahlavi Adlam Hanifi_Rohingya Sogdian}
\p{Script_Extensions=Arabic Syriac Thaana}
\p{Script_Extensions=Arabic Syriac Thaana Hanifi_Rohingya}
\p{Script_Extensions=Arabic Thaana}
\p{Script_Extensions=Armenian}
\p{Script_Extensions=Armenian Georgian}
\p{Script_Extensions=Avestan}
\p{Script_Extensions=Balinese}
\p{Script_Extensions=Bamum}
\p{Script_Extensions=Bassa_Vah}
\p{Script_Extensions=Batak}
\p{Script_Extensions=Bengali}
\p{Script_Extensions=Bengali Devanagari}
\p{Script_Extensions=Bengali Devanagari Gujarati Gurmukhi Kannada Latin Malayalam Oriya Tamil Telugu Grantha Sharada Tirhuta}
\p{Script_Extensions=Bengali Devanagari Gujarati Gurmukhi Kannada Latin Malayalam Oriya Tamil Telugu Grantha Tirhuta}
\p{Script_Extensions=Bengali Devanagari Gujarati Gurmukhi Kannada Malayalam Oriya Sinhala Tamil Telugu Limbu Syloti_Nagri Grantha Khudawadi Takri Tirhuta Mahajani Dogra Gunjala_Gondi}
\p{Script_Extensions=Bengali Devanagari Gujarati Gurmukhi Kannada Malayalam Oriya Sinhala Tamil Telugu Syloti_Nagri Grantha Khudawadi Takri Tirhuta Mahajani Dogra Gunjala_Gondi}
\p{Script_Extensions=Bengali Devanagari Kannada Grantha}
\p{Script_Extensions=Bengali Syloti_Nagri Chakma}
\p{Script_Extensions=Bhaiksuki}
\p{Script_Extensions=Bopomofo}
\p{Script_Extensions=Bopomofo Han}
\p{Script_Extensions=Bopomofo Han Hangul Hiragana Katakana}
\p{Script_Extensions=Bopomofo Han Hangul Hiragana Katakana Yi}
\p{Script_Extensions=Brahmi}
\p{Script_Extensions=Braille}
\p{Script_Extensions=Buginese}
\p{Script_Extensions=Buginese Javanese}
\p{Script_Extensions=Buhid}
\p{Script_Extensions=Canadian_Aboriginal}
\p{Script_Extensions=Carian}
\p{Script_Extensions=Caucasian_Albanian}
\p{Script_Extensions=Chakma}
\p{Script_Extensions=Cham}
\p{Script_Extensions=Cherokee}
\p{Script_Extensions=Common}
\p{Script_Extensions=Coptic}
\p{Script_Extensions=Cuneiform}
\p{Script_Extensions=Cypriot}
\p{Script_Extensions=Cypriot Linear_B}
\p{Script_Extensions=Cypriot Linear_B Linear_A}
\p{Script_Extensions=Cyrillic}
\p{Script_Extensions=Cyrillic Glagolitic}
\p{Script_Extensions=Cyrillic Latin}
\p{Script_Extensions=Cyrillic Old_Permic}
\p{Script_Extensions=Deseret}
\p{Script_Extensions=Devanagari}
\p{Script_Extensions=Devanagari Grantha}
\p{Script_Extensions=Devanagari Gujarati Gurmukhi Kaithi Khudawadi Takri Khojki Tirhuta Mahajani Modi Dogra}
\p{Script_Extensions=Devanagari Gujarati Gurmukhi Kannada Kaithi Khudawadi Takri Khojki Tirhuta Mahajani Modi Dogra}
\p{Script_Extensions=Devanagari Gujarati Gurmukhi Kannada Malayalam Kaithi Khudawadi Takri Khojki Tirhuta Mahajani Modi Dogra}
\p{Script_Extensions=Devanagari Kaithi Mahajani Dogra}
\p{Script_Extensions=Devanagari Kannada Grantha}
\p{Script_Extensions=Devanagari Kannada Malayalam Oriya Tamil Telugu}
\p{Script_Extensions=Devanagari Latin Grantha}
\p{Script_Extensions=Devanagari Sharada}
\p{Script_Extensions=Devanagari Tamil}
\p{Script_Extensions=Dogra}
\p{Script_Extensions=Duployan}
\p{Script_Extensions=Egyptian_Hieroglyphs}
\p{Script_Extensions=Elbasan}
\p{Script_Extensions=Ethiopic}
\p{Script_Extensions=Georgian}
\p{Script_Extensions=Georgian Latin}
\p{Script_Extensions=Glagolitic}
\p{Script_Extensions=Gothic}
\p{Script_Extensions=Grantha}
\p{Script_Extensions=Greek}
\p{Script_Extensions=Gujarati}
\p{Script_Extensions=Gujarati Khojki}
\p{Script_Extensions=Gunjala_Gondi}
\p{Script_Extensions=Gurmukhi}
\p{Script_Extensions=Gurmukhi Multani}
\p{Script_Extensions=Han}
\p{Script_Extensions=Han Hiragana Katakana}
\p{Script_Extensions=Hangul}
\p{Script_Extensions=Hanifi_Rohingya}
\p{Script_Extensions=Hanunoo}
\p{Script_Extensions=Hatran}
\p{Script_Extensions=Hebrew}
\p{Script_Extensions=Hiragana}
\p{Script_Extensions=Hiragana Katakana}
\p{Script_Extensions=Imperial_Aramaic}
\p{Script_Extensions=Inherited}
\p{Script_Extensions=Inscriptional_Pahlavi}
\p{Script_Extensions=Inscriptional_Parthian}
\p{Script_Extensions=Javanese}
\p{Script_Extensions=Kaithi}
\p{Script_Extensions=Kannada}
\p{Script_Extensions=Katakana}
\p{Script_Extensions=Kayah_Li}
\p{Script_Extensions=Kharoshthi}
\p{Script_Extensions=Khmer}
\p{Script_Extensions=Khojki}
\p{Script_Extensions=Khudawadi}
\p{Script_Extensions=Lao}
\p{Script_Extensions=Latin}
\p{Script_Extensions=Latin Myanmar Kayah_Li}
\p{Script_Extensions=Lepcha}
\p{Script_Extensions=Limbu}
\p{Script_Extensions=Linear_A}
\p{Script_Extensions=Linear_B}
\p{Script_Extensions=Lisu}
\p{Script_Extensions=Lycian}
\p{Script_Extensions=Lydian}
\p{Script_Extensions=Mahajani}
\p{Script_Extensions=Makasar}
\p{Script_Extensions=Malayalam}
\p{Script_Extensions=Mandaic}
\p{Script_Extensions=Manichaean}
\p{Script_Extensions=Marchen}
\p{Script_Extensions=Masaram_Gondi}
\p{Script_Extensions=Medefaidrin}
\p{Script_Extensions=Meetei_Mayek}
\p{Script_Extensions=Mende_Kikakui}
\p{Script_Extensions=Meroitic_Cursive}
\p{Script_Extensions=Meroitic_Hieroglyphs}
\p{Script_Extensions=Miao}
\p{Script_Extensions=Modi}
\p{Script_Extensions=Mongolian}
\p{Script_Extensions=Mongolian Phags_Pa}
\p{Script_Extensions=Mro}
\p{Script_Extensions=Multani}
\p{Script_Extensions=Myanmar}
\p{Script_Extensions=Myanmar Tai_Le Chakma}
\p{Script_Extensions=Nabataean}
\p{Script_Extensions=New_Tai_Lue}
\p{Script_Extensions=Newa}
\p{Script_Extensions=Nko}
\p{Script_Extensions=Nushu}
\p{Script_Extensions=Ogham}
\p{Script_Extensions=Ol_Chiki}
\p{Script_Extensions=Old_Hungarian}
\p{Script_Extensions=Old_Italic}
\p{Script_Extensions=Old_North_Arabian}
\p{Script_Extensions=Old_Permic}
\p{Script_Extensions=Old_Persian}
\p{Script_Extensions=Old_Sogdian}
\p{Script_Extensions=Old_South_Arabian}
\p{Script_Extensions=Old_Turkic}
\p{Script_Extensions=Oriya}
\p{Script_Extensions=Osage}
\p{Script_Extensions=Osmanya}
\p{Script_Extensions=Pahawh_Hmong}
\p{Script_Extensions=Palmyrene}
\p{Script_Extensions=Pau_Cin_Hau}
\p{Script_Extensions=Phags_Pa}
\p{Script_Extensions=Phoenician}
\p{Script_Extensions=Psalter_Pahlavi}
\p{Script_Extensions=Rejang}
\p{Script_Extensions=Runic}
\p{Script_Extensions=Samaritan}
\p{Script_Extensions=Saurashtra}
\p{Script_Extensions=Sharada}
\p{Script_Extensions=Shavian}
\p{Script_Extensions=Siddham}
\p{Script_Extensions=SignWriting}
\p{Script_Extensions=Sinhala}
\p{Script_Extensions=Sogdian}
\p{Script_Extensions=Sora_Sompeng}
\p{Script_Extensions=Soyombo}
\p{Script_Extensions=Sundanese}
\p{Script_Extensions=Syloti_Nagri}
\p{Script_Extensions=Syriac}
\p{Script_Extensions=Tagalog}
\p{Script_Extensions=Tagalog Hanunoo Buhid Tagbanwa}
\p{Script_Extensions=Tagbanwa}
\p{Script_Extensions=Tai_Le}
\p{Script_Extensions=Tai_Tham}
\p{Script_Extensions=Tai_Viet}
\p{Script_Extensions=Takri}
\p{Script_Extensions=Tamil}
\p{Script_Extensions=Tamil Grantha}
\p{Script_Extensions=Tangut}
\p{Script_Extensions=Telugu}
\p{Script_Extensions=Thaana}
\p{Script_Extensions=Thai}
\p{Script_Extensions=Tibetan}
\p{Script_Extensions=Tifinagh}
\p{Script_Extensions=Tirhuta}
\p{Script_Extensions=Ugaritic}
\p{Script_Extensions=Unknown}
\p{Script_Extensions=Vai}
\p{Script_Extensions=Warang_Citi}
\p{Script_Extensions=Yi}
\p{Script_Extensions=Zanabazar_Square}