使用Jerry的代码可以处理大多数字符串,但不是所有字符串,例如:
$pattern = '#^(?<tz_utf>(?:\([^)]+\)|[^-]+)+)\s+-\s+(?<tz>[^:]+)\s+:\s+(?<fr>[^/]+)\s+/\s+(?<en>[^/]+)\s+/\s+(?<ar>\S+)\s+(?<tz_dec_utf>[ⴰ-⵿ -]+)\s+(?<tz_dec>.*)$#imu';
// In this string, it doesn't validate because of no space between slash & word;
// /Alphabet => / Alphabet
// and comma in Arabic;
// ájóéHCG ,á«é¡J => ájóéHCGá«é¡J
$str4 = 'ⴰⴳⵎⵎⴰⵢ - agemmay : Alphabet, épellation /Alphabet, spelling / ájóéHCG ,á«é¡J
ⴰⴳⵎⵎⴰⵢ - ⵓⴳⵎⵎⴰⵢ - ⵉⴳⵎⵎⴰⵢⵏ
agemmay – ugemmay – igemmayen';
$str5 = 'ⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ - addad amaruz : Etat d’annexion / Construct state / ¥ÉëdEG ádÉM
ⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ - ⵡⴰⴷⴷⴰⴷ ⴰⵎⴰⵔⵓⵣ
addad amaruz - waddad amaruz';
$str6 = 'ⴰⴷⴷⴰⴷ ⵉⵍⴻⵍⵍⵉ - addad ilelli : Etat libre / Free state / ∫É°SQEG ádÉM
ⴰⴷⴷⴰⴷ ⵉⵍⴻⵍⵍⵉ
addad ilelli';
print_r( preg_match($pattern, $str, $matches) );
我现在使用的代码只匹配整个字符串的一部分($ matches [1]),是否可以使用一个正则表达式提取字符串的其他部分?:
$pattern = '/-(.*?)\:/';
$str1 = 'ⵜⴰⵙⵎⵙⵙⵉⵜ - tasmessit : Focalisée / Focus / QCÉÑe ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ tasmssit - tsmssit - tismssitin';
preg_match($pattern, $str1, $matches);
$arr1 = array(
'tz_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ',
'tz'=> $matches[1], // tasmessit
'fr'=>'Focalisée',
'en'=>'Focus',
'ar'=>'QCÉÑe',
'tz_dec_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ',
'tz_dec'=>'tasmssit - tsmssit - tismssitin'
);
print_r($matches[1]);
对于那里的任何正则表达大师:)
请帮助preg_split一些字符串到一个数组?字符串值可能会有所不同,看起来与此方案类似:
$str1 = 'ⵜⴰⵙⵎⵙⵙⵉⵜ - tasmessit : Focalisée / Focus / QCÉÑe ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ tasmssit - tsmssit - tismssitin';
$str2 = 'ⵜⴰⵙⵏⴰⵥⵖⵓⵕⵜ ( ⵏ-) - tasnaÇvurt (n-) : Etymologique / Etymological / »dÉKCG ⵏ ⵜⵙⵏⴰⵥⵖⵓⵕⵜ n tesnaÇvurt';
$str3 = 'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - tasvunt tanadawt : Subordonnant / Subordinating (conjunction) / §HGQ ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - ⵜⵉⵙⵖⵡⴰⵏ ⵜⵉⵏⴰⴷⴰⵡⵉⵏ tasvunt tanadawt - tisevwan tinadawin';
正确的结果将是;
$arr1 = array(
'tz_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ',
'tz'=>'tasmessit',
'fr'=>'Focalisée',
'en'=>'Focus',
'ar'=>'QCÉÑe',
'tz_dec_utf'=>'ⵜⴰⵙⵎⵙⵙⵉⵜ - ⵜⵙⵎⵙⵙⵉⵜ - ⵜⵉⵙⵎⵙⵙⵉⵜⵉⵏ',
'tz_dec'=>'tasmssit - tsmssit - tismssitin'
);
$arr2 = array(
'tz_utf'=>'ⵜⴰⵙⵏⴰⵥⵖⵓⵕⵜ ( ⵏ-)',
'tz'=>'tasnaÇvurt (n-)',
'fr'=>'Etymologique',
'en'=>'Etymological',
'ar'=>'»dÉKCG',
'tz_dec_utf'=>'ⵏ ⵜⵙⵏⴰⵥⵖⵓⵕⵜ',
'tz_dec'=>'n tesnaÇvur'
);
$arr3 = array(
'tz_utf'=>'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ',
'tz'=>'tasvunt tanadawt',
'fr'=>'Subordonnant',
'en'=>'Subordinating (conjunction)',
'ar'=>'§HGQ',
'tz_dec_utf'=>'ⵜⴰⵙⵖⵓⵏⵜ ⵜⴰⵏⴰⴷⴰⵡⵜ - ⵜⵉⵙⵖⵡⴰⵏ ⵜⵉⵏⴰⴷⴰⵡⵉⵏ',
'tz_dec'=>'tasvunt tanadawt - tisevwan tinadawin'
);
tz_utf
是unicode的Tifinagh字符。
由于
答案 0 :(得分:1)
尝试使用正则表达式:
~^(?<tz_utf>(?:\([^)]+\)|[^-]+)+)\s+-\s+(?<tz>[^:]+)\s+:\s+(?<fr>[^/]+)\s+/\s+(?<en>[^/]+)\s+/\s+(?<ar>\S+)\s+(?<tz_dec_utf>[ⴰ-⵿ -]+)\s+(?<tz_dec>.*)$~ui
警告,我不确定亚美尼亚字符中的特殊字符部分(我使用\S+
假设它们是单个字,我使用this site的范围来表示字符它看起来像白色方块),但它适用于您提供的样本。