正则表达式从地址中查找State和Zip

时间:2016-10-15 09:05:38

标签: regex

尝试制作可以从地址获取状态的正则表达式

1- 1234 Bellaire Blvd,Suite 123,Houston,TX 77036

2- 1234 BELLAIRE BL#123,HOUSTON,TX 77036

我有这个状态

  

\ W {2}(?= \ S \ d {1,5})

这是Zip

  

(小于?= \ W {2} \ S)\ d {5}

FOR STATE

在第一种情况下,正则表达式从“套件”返回“te”,TX用于正确的状态

然而,在第二种情况下,它什么都没有返回

FOR ZIP

第一种情况下返回77036,第二种情况下返回null

2 个答案:

答案 0 :(得分:1)

我不认为正则表达式是最好的方法。相反,我会使用API将地址解析为其组件。您将需要state_abbreviation并进行排序。回复示例:

[
    {
        "input_index": 0,
        "candidate_index": 0,
        "delivery_line_1": "1 Santa Claus Ln",
        "last_line": "North Pole AK 99705-9901",
        "delivery_point_barcode": "997059901010",
        "components": {
            "primary_number": "1",
            "street_name": "Santa Claus",
            "street_suffix": "Ln",
            "city_name": "North Pole",
            "state_abbreviation": "AK",
            "zipcode": "99705",
            "plus4_code": "9901",
            "delivery_point": "01",
            "delivery_point_check_digit": "0"
        },
        "metadata": {
            "record_type": "S",
            "zip_type": "Standard",
            "county_fips": "02090",
            "county_name": "Fairbanks North Star",
            "carrier_route": "C004",
            "congressional_district": "AL",
            "rdi": "Commercial",
            "elot_sequence": "0001",
            "elot_sort": "A",
            "latitude": 64.75233,
            "longitude": -147.35297,
            "precision": "Zip8",
            "time_zone": "Alaska",
            "utc_offset": -9,
            "dst": true
        },
        "analysis": {
            "dpv_match_code": "Y",
            "dpv_footnotes": "AABB",
            "dpv_cmra": "N",
            "dpv_vacant": "N",
            "active": "Y",
            "footnotes": "L#"
        }
    },

    {
        "input_index": 1,
        "candidate_index": 0,
        "addressee": "Apple Inc",
        "delivery_line_1": "1 Infinite Loop",
        // truncated for brevity
    }
]

希望有所帮助。

答案 1 :(得分:0)

您可以匹配',([A-Z] {2})'状态将是括号匹配的子模式。在python中它看起来像这样。

import re

s1 = "1- 1234 Bellaire Blvd, Suite 123, Houston, TX 77036"

s2 = "2- 1234 BELLAIRE BL #123, HOUSTON, TX 77036"

m = re.search(', ([A-Z]{2}) ', s1)

print(m.group(1))