我有这些功能pakage http://nadeausoftware.com/articles/2008/05/php_tip_how_parse_and_build_urls
如果我们这样做的话
$x = url_to_absolute('http://al-mashhad.com/News/النيابة-تستمع-لأقوال-خالد-يوسف-في-بلاغه-ضد-أبو-إسم/141274.aspx','../../Media/News/2012/12/16/2012-634912584761067771-106.jpg');
var_dump($x);
它将返回false 因为这些功能不支持阿拉伯语
特别是这个功能
function split_url( $url, $decode=TRUE )
{
// Character sets from RFC3986.
$xunressub = 'a-zA-Z\d\-._~\!$&\'()*+,;=';
$xpchar = $xunressub . ':@%';
// Scheme from RFC3986.
$xscheme = '([a-zA-Z][a-zA-Z\d+-.]*)';
// User info (user + password) from RFC3986.
$xuserinfo = '(([' . $xunressub . '%]*)' .
'(:([' . $xunressub . ':%]*))?)';
// IPv4 from RFC3986 (without digit constraints).
$xipv4 = '(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})';
// IPv6 from RFC2732 (without digit and grouping constraints).
$xipv6 = '(\[([a-fA-F\d.:]+)\])';
// Host name from RFC1035. Technically, must start with a letter.
// Relax that restriction to better parse URL structure, then
// leave host name validation to application.
$xhost_name = '([a-zA-Z\d-.%]+)';
// Authority from RFC3986. Skip IP future.
$xhost = '(' . $xhost_name . '|' . $xipv4 . '|' . $xipv6 . ')';
$xport = '(\d*)';
$xauthority = '((' . $xuserinfo . '@)?' . $xhost .
'?(:' . $xport . ')?)';
// Path from RFC3986. Blend absolute & relative for efficiency.
$xslash_seg = '(/[' . $xpchar . ']*)';
$xpath_authabs = '((//' . $xauthority . ')((/[' . $xpchar . ']*)*))';
$xpath_rel = '([' . $xpchar . ']+' . $xslash_seg . '*)';
$xpath_abs = '(/(' . $xpath_rel . ')?)';
$xapath = '(' . $xpath_authabs . '|' . $xpath_abs .
'|' . $xpath_rel . ')';
// Query and fragment from RFC3986.
$xqueryfrag = '([' . $xpchar . '/?' . ']*)';
// URL.
$xurl = '^(' . $xscheme . ':)?' . $xapath . '?' .
'(\?' . $xqueryfrag . ')?(#' . $xqueryfrag . ')?$';
// Split the URL into components.
if ( !preg_match( '!' . $xurl . '!', $url, $m ) )
return FALSE;
if ( !empty($m[2]) ) $parts['scheme'] = strtolower($m[2]);
if ( !empty($m[7]) ) {
if ( isset( $m[9] ) ) $parts['user'] = $m[9];
else $parts['user'] = '';
}
if ( !empty($m[10]) ) $parts['pass'] = $m[11];
if ( !empty($m[13]) ) $h=$parts['host'] = $m[13];
else if ( !empty($m[14]) ) $parts['host'] = $m[14];
else if ( !empty($m[16]) ) $parts['host'] = $m[16];
else if ( !empty( $m[5] ) ) $parts['host'] = '';
if ( !empty($m[17]) ) $parts['port'] = $m[18];
if ( !empty($m[19]) ) $parts['path'] = $m[19];
else if ( !empty($m[21]) ) $parts['path'] = $m[21];
else if ( !empty($m[25]) ) $parts['path'] = $m[25];
if ( !empty($m[27]) ) $parts['query'] = $m[28];
if ( !empty($m[29]) ) $parts['fragment']= $m[30];
if ( !$decode )
return $parts;
if ( !empty($parts['user']) )
$parts['user'] = rawurldecode( $parts['user'] );
if ( !empty($parts['pass']) )
$parts['pass'] = rawurldecode( $parts['pass'] );
if ( !empty($parts['path']) )
$parts['path'] = rawurldecode( $parts['path'] );
if ( isset($h) )
$parts['host'] = rawurldecode( $parts['host'] );
if ( !empty($parts['query']) )
$parts['query'] = rawurldecode( $parts['query'] );
if ( !empty($parts['fragment']) )
$parts['fragment'] = rawurldecode( $parts['fragment'] );
return $parts;
}
问题是
如何添加正则表达式以使其支持URL中的阿拉伯语
答案 0 :(得分:0)
您显示的网址is not really a valid URL。 URL中只允许使用ASCII字符;其他任何事情,你需要percent encode。浏览器无论如何都会显示正确的字符。
首先在网址上运行urlencode()
,这会将阿拉伯字符转换为%xx
个实体;然后在它上面运行你的功能。
即使您这样做,现代浏览器也会自动显示阿拉伯字符。