我需要在PHP中解析一个可能有缩写的字符串的街道地址。 此字符串来自文本输入。 我需要搜索的字段是:
例如,用户提交以下文本文本之一:
我希望将结果视为数组:
到目前为止,我的代码......但是我还没有使用示例3.,4.,5.,6。:
<?php
//posted address
$address = "str main one bldg 5b other param area 1";
//to replace
$replace = ['street'=>['st','str'],
'building'=>['bldg','bld'],
'number'=>['nr','numb','nmbr']];
//replace
foreach($replace as $field=>$abbrs)
foreach($abbrs as $abbr)
$address = str_replace($abbr.' ',$field.' ',$address);
//fields
$fields = array_keys($replace);
//match
if(preg_match_all('/('.implode('|',array_keys($fields)).')\s+([^\s]+)/si', $address, $matches)) {
//matches
$search = array_combine($matches[1], $matches[2]);
//other
$search['other'] = str_replace($matches[0],"",$address);
}else{
//search in all the fields
$search['other'] = $address;
}
//search
print_r($search);
代码测试员:http://ideone.com/j3q4YI
答案 0 :(得分:1)
谢谢你的帮助!我认为我应该做多个preg_matches之类的事情。
我刚刚发现了一个完全符合我想要的PHP扩展。
该库是PHP Postal(https://github.com/openvenues/php-postal)并需要libpostal。运行PHP时,加载库大约需要15-20秒,之后一切正常。
解析的总执行时间:0.00030-0.00060秒。
$parsed = Postal\Parser::parse_address("The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom");
foreach ($parsed as $component) {
echo "{$component['label']}: {$component['value']}\n";
}
输出:
house: the book club
house_number: 100-106
road: leonard st
suburb: shoreditch
city: london
state_district: greater london
postcode: ec2a 4rh
country: united kingdom
此后我必须做的就是更换我的标签并格式化地址。
希望这会有助于其他想要用PHP解析地址的人。
答案 1 :(得分:0)
$addresses=array(
"street Main Road Bulding H7 Number 5 Area 1",
"st Main Road bldg H7 Nr 5 Ar 5",
"stMain bldgh7",
"ar5 unknown other search parameter",
"street Main Road h7 2b",
"street main street str main road"
);
$regex["area"]="/^(.*?)(ar(?:ea)?\s?)([1-5])(.*?)$/i";
$regex["number"]="/^(.*?)(n(?:umbe)?r\s?)([0-9]+)(.*?)$/i";
$regex["building"]="/^(.*?)(bu?i?ldi?n?g\s?)([^\s]+)(.*?)$/i";
$regex["corner"]="/^(.*?str?(?:eet)?)\s?(str?(?:eet)?.*)$/i"; // 2 streets in string
$regex["street"]="/^(.*?)(str?(?:eet)?\s?)([^\s]*(?:\s?ro?a?d|\s?str?e?e?t?|.*?))(\s?.*?)$/i";
$regex["other"]="/^(.+)$/";
$search=[];
foreach($addresses as $i=>$address){
echo "<br><div><b>$address</b> breakdown:</div>";
foreach($regex as $key=>$rgx){
if(strlen($address)>0){
//echo "<div>addr(",strlen($address),") $address</div>";
if(preg_match($rgx,$address,$matches)){
if($key=="other"){
$search[$i][$key]=$matches[0]; // everything that remains
}elseif($key=="corner"){
$search[$i]["street"]=""; // NOTICE suppression
// loop through both halves of corner address omitting element[0]
foreach(array_diff_key($matches,array('')) as $half){
//echo "half= $half<br>";
if(preg_match($regex["street"],$half,$half_matches)){
//print_r($half_matches);
$search[$i]["street"].=(strlen($search[$i]["street"])>0?"&&":"").ucwords($half_matches[3]);
$address=trim($half_matches[1].$half_matches[4]);
// $matches[2] is the discarded identifier
//echo "<div>$key Found: {$search[$i][$key]}</div>";
//echo "<div>Remaining: $address</div>";
}
}
}else{
$search[$i][$key]=($key=="street"?ucwords($matches[3]):$matches[3]);
$address=trim($matches[1].$matches[4]);
// $matches[2] is the discarded identifier
//echo "<div>$key Found: {$search[$i][$key]}</div>";
//echo "<div>Remaining: $address</div>";
//print_r($matches);
}
}
}else{
break; // address is fully processed
}
}
echo "<pre>";
var_export($search[$i]);
echo "</pre>";
}
输出是一个满足你的简要说明的数组,但由于我没有按顺序捕获地址组件,因此键无关紧要 - 这对你来说无关紧要,所以我没有打扰它重新排序。 / p>
street Main Road Bulding H7 Number 5 Area 1 breakdown:
array (
'area' => '1',
'number' => '5',
'building' => 'H7',
'street' => 'Main Road',
)
st Main Road bldg H7 Nr 5 Ar 5 breakdown:
array (
'area' => '5',
'number' => '5',
'building' => 'H7',
'street' => 'Main Road',
)
stMain bldgh7 breakdown:
array (
'building' => 'h7',
'street' => 'Main',
)
ar5 unknown other search parameter breakdown:
array (
'area' => '5',
'other' => 'unknown other search parameter',
)
street Main Road h7 2b breakdown:
array (
'street' => 'Main Road',
'other' => 'h7 2b',
)
street main street str main road breakdown:
array (
'street' => 'Main Street&&Main Road',
)
...男孩,我很高兴这个项目不属于我。祝你好运!