我必须创建一个循环,并使用正则表达式 填充4个变量中的任何一个
$address, $street, $town, $lot
循环将被输入一个可能包含信息的字符串 如下面的行
'123 any street, mytown'
或'Lot 4 another road, thattown'
或'Lot 2 96 other road, her town'
或'this ave, this town'
或'yourtown'
因为逗号之后的任何内容都是我想到的$town
(.*), (.*)
然后可以使用(Lot \d*) (.*), (.*)
检查第一次捕获
如果第一次捕获以数字开头,那么它的地址(如果带有空格的单词为$street
)
如果是一个单词,那就是$town
答案 0 :(得分:7)
我建议你不要尝试在一个正则表达式中完成所有这些操作,因为很难验证它的正确性。
首先,我在逗号处分开。无论逗号之后是什么,是$ town,如果没有逗号,整个字符串就是$ town。
然后我会检查是否有任何批次信息并从字符串中提取它。
然后我会寻找街道/大道的号码和名称。
分而治之:)
答案 1 :(得分:7)
如果这些是美国地址,请查看Geo::StreetAddress::US。
即使它们不是,该模块的来源也应该让您了解解析自由格式街道地址所涉及的内容。
这是一个处理您发布的地址的脚本(已更新,早期版本将批次和数字合并为一个字符串):
#!/usr/bin/perl
use strict; use warnings;
local $/ = "";
my @addresses;
while ( my $address = <DATA> ) {
chomp $address;
$address =~ s/\s+/ /g;
my (%address, $rest);
($address{town}, $rest) = map { scalar reverse }
split( / ?, ?/, reverse($address), 2 );
{
no warnings 'uninitialized';
@address{qw(lot number street)} =
$rest =~ /^(?:(Lot [0-9]) )?(?:([0-9]+) )?(.+)\z/;
}
push @addresses, \%address;
}
use Data::Dumper;
print Dumper \@addresses;
__DATA__
123 any street,
mytown
Lot 4 another road,
thattown
Lot 2 96 other road,
her town
yourtown
street,
town
输出:
$VAR1 = [ { 'lot' => undef, 'number' => '123', 'street' => 'any street', 'town' => 'mytown' }, { 'lot' => 'Lot 4', 'number' => undef, 'street' => 'another road', 'town' => 'thattown' }, { 'lot' => 'Lot 2', 'number' => '96', 'street' => 'other road', 'town' => 'her town' }, { 'lot' => undef, 'number' => undef, 'street' => undef, 'town' => 'yourtown' }, { 'lot' => undef, 'number' => undef, 'street' => 'street', 'town' => 'town' } ];
答案 2 :(得分:1)
这应该分为3个部分 - 你如何区分地址/街道?
(Lot \d*)? ?([^,]*,)? ?(.*)
这是您的示例的细分
('', '123 any street,', 'mytown')
('Lot 4', 'another road,', 'thattown')
('Lot 2', '96 other road,', 'her town')
('', 'this ave,', 'this town')
('', '', 'yourtown')
如果我理解正确,这个也将地址/街道分开
(Lot \d*)? ?(\d*) ?([^,]*,)? ?(.*)
('', '123', 'any street,', 'mytown')
('Lot 4', '', 'another road,', 'thattown')
('Lot 2', '96', 'other road,', 'her town')
('', '', 'this ave,', 'this town')
('', '', '', 'yourtown')
答案 3 :(得分:0)
我无法匹配最后一个,但对于前3个,你可以使用这样的东西:
if (preg_match('/(?:Lot (\d*)|)(?: |)(?:(\d*)|) (.*), (.*)/m', $subject, $regs)) {
$result = $regs[1];
} else {
$result = "";
}
这是测试正则表达式:
(?:Lot (\d*)|)(?: |)(?:(\d*)|) (.*), (.*)
您可以在regexbuddy中使用此功能来测试:link
答案 4 :(得分:0)
Geo :: StreetAddress :: US适用于简单地址,但在较难的示例中可能会失去上下文。它会解析街道名称,直到找到一个郊区。所以用&#34; 46 7th St. Johns Park&#34;,&#39; St。&#39;过早消耗,街道类型被错误地分配给&#39; Park&#39; &CA&#39; CA&#39;成为郊区。
2 Smith St Suburb NJ 12345 2 Smith St Suburb NJ 12345
25 MIRROR LAKE DR LITTLE EGG HARBOR 25 MIRROR LAKE DR Hbr NJ 0
74B Old Bohema Rd N, St. Johns Park 74 B Old Bohema Rd St Johns Park CA 95472
74 Mt Baw Baw Rd Suite C Some Park C 74 Mt Baw Baw Rd S Park CA 0
74 Old Bohema Rd Bldg A Some Park CA 74 Old Bohema Rd B Park CA 0
74 Old Bohema Rd Rm 123A Some Park C 74 Old Bohema Rd R Park CA 0
Lot 74 Old Bohema Rd Some Park CA 95 0 Old Bohema Rd S Park CA 0
22 Glen Alpine Way Some Park CA 9547 22 Glen Alpine Way Park CA 0
4/6 Bohema Rd, St. Johns Park CA 954 4 6 Bohema Rd St Johns Park CA 95472
46 The Parade, St. Johns Park CA 954 46 The Parade 0
46 7th St. Johns Park CA 95472 46 7th St Johns Park CA 0
46 B Avenue Johns Park CA 95472 46 B Avenue Johns Park CA 0
46 Avenue C Johns Park CA 95472 46 Avenue C Johns Park CA 0
46 Broadway Johns Park CA 95472 46 Broadway Johns Park CA 0
46 State Route 19 Johns Park CA 9547 46 State Route 19 Park CA 0
46 John F Kennedy Drive Johns Park C 46 John F Kennedy Park CA 0
PO Box 213 Somewhere IO 1234 0 Somewhere IO 0
1 BEACH DR SE # 2410 ST PETERSBURG F 1 BEACH DR SE # 2 St PETERSBURG FL 33701
# 123 12 BEACH DR SE ST PETERSBURG F 12 BEACH DR SE St PETERSBURG FL 33701
46 Broad Street #12 Suburb CA 95472 46 Broad St 0
我开发了一个Perl模块,可以识别许多这些更难的模式https://metacpan.org/release/Lingua-EN-AddressParse。它承认诸如The Parade&#34;,nth Street之类的成语,以及诸如&#34; 46 Broad Street#12&#34;还有更多。