我试图编写一个可以捕获多种形式的地址的正则表达式。这一切都很有效,直到我尝试编码郊区可能有多个单词的可能性。
下面'我现在得到的是什么:
Input:
"Unit 1/61 bob-bob east st. bobville vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = "VIC"
Postcode = "3070"
Input:
"Unit 1/61 bob-bob east st. bobville west vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = ""
Postcode = ""
这是正则表达式:
new MyRegex("Address2", @"((?<PropertyType>Unit|Lot|Level|Floor|P.?O.? Box)\b)?" +
@"\s*((?<Unit>\d+)(/|\\|-| ))?" +
@"\s*(?<Number>\d+)" +
@"\s*(?<Street>[a-z]+((\s*|-?)[a-z]+)*?)" +
@"\s*(?<StreetType>st|rd|ave|hwy|cct|ct|cl|gr|street|road|avenue|highway|circuit|court|close|grove)\.?" +
@"\s*(?<Suburb>[a-z]+((\s*|-?)[a-z]+)*?)?" +
@"\s*(?<State>Victoria|Tasmania|Queensland|New South Wales|(South|Western) Australia|(Northern|Australian Capital) Territory|VIC|NSW|SA|WA|NT|TAS|ACT|QLD)?" +
@"\s*(?<Postcode>\d{4})?"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture)
用以下内容替换郊区线:
\s*(?<Suburb>[a-z]+(((\s*|-?)[a-z]+){1,2}?)?)?
将捕获&#34; Unit 1/61 bob-bob east st。 bobville west vic 3070&#34;正确,但&#34; Unit 1/61 bob-bob east st。 bobville-bob west vic 3070&#34;韩元&#39;吨
同样用以下方式替换郊区线:
\s*(?<Suburb>[a-z]+(((\s*|-?)[a-z]+){1,2})?)?
将捕获&#34; Unit 1/61 bob-bob east st。 bobville-east west vic 3070&#34;,但不是&#34; Unit 1/61 bob-bob east st。 bobville west vic 3070&#34;。
用以下内容替换郊区线:
\s*(?<Suburb>[a-z]+((\s*|-?)[a-z]+){0,2}?)?
除了&#34; Unit 1/61 bob-bob east st之外什么都不喜欢。 bobville vic 3070&#34;。改变{0,2}?到{0,2},然后也捕捉到郊区线的状态。
关于我如何清理它的任何想法?
答案 0 :(得分:0)
我已经整理了一个更快的地址正则表达式,它也及时失败了。它基于这个正则表达式: www.regexlib.com
我认为id会发布,以防有人在赛道上需要类似的东西:
new Regex(@"
^(
((?<PropertyType>[a-z\ ,\.']+?)\ *?)?
((?<Unit>\d+)(,|/|-|[\ ]*?))?
(\b(?<Number>\d+[a-z]?)\b)\ *?
(?<Street>[\w\ '-]+)
(\b(?<StreetType>STREET|ST|ROAD|RD|GROVE|GR|DRIVE|DR|AVENUE|AVE|CIRCUIT|CCT|CLOSE|CL|COURT|CRT|CT|CRESCENT|CRES|PLACE|PL|PARADE|PDE|BOULEVARD|BLVD|HIGHWAY|HWY|ALLEY|ALLY|APPROACH|APP|ARCADE|ARC|BROW|BYPASS|BYPA|CAUSEWAY|CWAY|CIRCUS|CIRC|COPSE|CPSE|CORNER|CNR|COVE|END|ESPLANANDE|ESP|FLAT|FREEWAY|FWAY|FRONTAGE|FRNT|GARDENS|GDNS|GLADE|GLD|GLEN|GREEN|GRN|HEIGHTS|HTS|LANE|LINK|LOOP|MALL|MEWS|PACKET|PCKT|PARK|PARKWAY|PKWY|PROMENADE|PROM|RESERVE|RES|RIDGE|RDGE|RISE|ROW|SQUARE|SQ|STRIP|STRP|TARN|TERRACE|TCE|THOROUGHFARE|TFRE|TRACK|TRAC|TRUNKWAY|TWAY|VIEW|VISTA|VSTA|WALK|WAY|WALKWAY|WWAY|YARD)\b).?,?\ *?
)
((?<Suburb>[a-z'.]+([\-,\ ]+[a-z'.]+)*?),?\ *?)?
(\b(?<State>New\ South\ Wales|NSW|Victoria|VIC|Queensland|QLD|Australian\ Capital\ Territory|ACT|South\ Australia|SA|West\ Australia|WA|Tasmania|TAS|Northern\ Territory|NT)\b,?\ *?)?
((?<Postcode>\d{4}),?\ *?)?
(Au(s(tralia)?)?)?
(\s(?=[^$]))*
$"
, RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture |
RegexOptions.IgnorePatternWhitespace)