地址超级正则表达式

时间:2014-09-08 09:10:48

标签: c# regex conditional

我试图编写一个可以捕获多种形式的地址的正则表达式。这一切都很有效,直到我尝试编码郊区可能有多个单词的可能性。

下面'我现在得到的是什么:

Input:
"Unit 1/61 bob-bob east st. bobville vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = "VIC"
Postcode = "3070"

Input:
"Unit 1/61 bob-bob east st. bobville west vic 3070"
Output Groups:
PropertyType = "Unit"
Unit = "1"
Number = "61"
Street = "bob-bob east"
Street Type = "st"
Suburb = "bobville"
State = ""
Postcode = ""

这是正则表达式:

new MyRegex("Address2", @"((?<PropertyType>Unit|Lot|Level|Floor|P.?O.? Box)\b)?" +
@"\s*((?<Unit>\d+)(/|\\|-| ))?" +
@"\s*(?<Number>\d+)" +
@"\s*(?<Street>[a-z]+((\s*|-?)[a-z]+)*?)" +
@"\s*(?<StreetType>st|rd|ave|hwy|cct|ct|cl|gr|street|road|avenue|highway|circuit|court|close|grove)\.?" +
@"\s*(?<Suburb>[a-z]+((\s*|-?)[a-z]+)*?)?" +
@"\s*(?<State>Victoria|Tasmania|Queensland|New South Wales|(South|Western) Australia|(Northern|Australian Capital) Territory|VIC|NSW|SA|WA|NT|TAS|ACT|QLD)?" +
@"\s*(?<Postcode>\d{4})?"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture)

用以下内容替换郊区线:

\s*(?<Suburb>[a-z]+(((\s*|-?)[a-z]+){1,2}?)?)?

将捕获&#34; Unit 1/61 bob-bob east st。 bobville west vic 3070&#34;正确,但&#34; Unit 1/61 bob-bob east st。 bobville-bob west vic 3070&#34;韩元&#39;吨

同样用以下方式替换郊区线:

\s*(?<Suburb>[a-z]+(((\s*|-?)[a-z]+){1,2})?)?

将捕获&#34; Unit 1/61 bob-bob east st。 bobville-east west vic 3070&#34;,但不是&#34; Unit 1/61 bob-bob east st。 bobville west vic 3070&#34;。

用以下内容替换郊区线:

\s*(?<Suburb>[a-z]+((\s*|-?)[a-z]+){0,2}?)?

除了&#34; Unit 1/61 bob-bob east st之外什么都不喜欢。 bobville vic 3070&#34;。改变{0,2}?到{0,2},然后也捕捉到郊区线的状态。

关于我如何清理它的任何想法?

1 个答案:

答案 0 :(得分:0)

我已经整理了一个更快的地址正则表达式,它也及时失败了。它基于这个正则表达式: www.regexlib.com

我认为id会发布,以防有​​人在赛道上需要类似的东西:

new Regex(@"
^(
    ((?<PropertyType>[a-z\ ,\.']+?)\ *?)?
    ((?<Unit>\d+)(,|/|-|[\ ]*?))?
    (\b(?<Number>\d+[a-z]?)\b)\ *?
    (?<Street>[\w\ '-]+)
    (\b(?<StreetType>STREET|ST|ROAD|RD|GROVE|GR|DRIVE|DR|AVENUE|AVE|CIRCUIT|CCT|CLOSE|CL|COURT|CRT|CT|CRESCENT|CRES|PLACE|PL|PARADE|PDE|BOULEVARD|BLVD|HIGHWAY|HWY|ALLEY|ALLY|APPROACH|APP|ARCADE|ARC|BROW|BYPASS|BYPA|CAUSEWAY|CWAY|CIRCUS|CIRC|COPSE|CPSE|CORNER|CNR|COVE|END|ESPLANANDE|ESP|FLAT|FREEWAY|FWAY|FRONTAGE|FRNT|GARDENS|GDNS|GLADE|GLD|GLEN|GREEN|GRN|HEIGHTS|HTS|LANE|LINK|LOOP|MALL|MEWS|PACKET|PCKT|PARK|PARKWAY|PKWY|PROMENADE|PROM|RESERVE|RES|RIDGE|RDGE|RISE|ROW|SQUARE|SQ|STRIP|STRP|TARN|TERRACE|TCE|THOROUGHFARE|TFRE|TRACK|TRAC|TRUNKWAY|TWAY|VIEW|VISTA|VSTA|WALK|WAY|WALKWAY|WWAY|YARD)\b).?,?\ *?
 )
 ((?<Suburb>[a-z'.]+([\-,\ ]+[a-z'.]+)*?),?\ *?)?
 (\b(?<State>New\ South\ Wales|NSW|Victoria|VIC|Queensland|QLD|Australian\ Capital\ Territory|ACT|South\ Australia|SA|West\ Australia|WA|Tasmania|TAS|Northern\ Territory|NT)\b,?\ *?)?
 ((?<Postcode>\d{4}),?\ *?)?
 (Au(s(tralia)?)?)?
 (\s(?=[^$]))* 
$"
, RegexOptions.IgnoreCase | 
  RegexOptions.ExplicitCapture |
  RegexOptions.IgnorePatternWhitespace)