有条件地重新排列字符串中单词的位置

时间:2016-08-08 13:12:54

标签: python

我花了最近几个月开发了一个程序,我的公司正在使用这个程序来大规模地清理和地理编码地址(约5,000 /天)。它运行得很好,但是,我每天看到的某些地址格式对我来说是个问题。

使用此park avenue 1等格式的地址会导致我的地理编码问题。我解决这个问题的思路如下:

  1. 将地址拆分为列表
  2. 在列表中查找我的分隔符词的索引。分隔符单词是avenue, street, road, etc之类的单词。我有一个名为patterns的这些分隔符的列表。
  3. 检查分隔符后面的单词是否由长度为4或更短的数字组成。如果该号码的长度大于4,则可能是邮政编码,我不需要。如果它小于4,则很可能是门牌号。
  4. 如果单词符合我在上一步中解释的标准,我需要将其移至列表中的第一个位置。
  5. 最后,我将列表重新组合成一个字符串。
  6. 这是我最初尝试将我的想法转化为代码:

    patterns ['my list of delimiters']
    address = 'park avenue 1'    # this is an example address
    address = address.split(' ')
    for pattern in patterns:
        location = address.index(pattern) + 1
        if address[location].isdigit() and len(address[location]) <= 4:
            # here is where i'm getting a bit confused
            # what would be a good way to go about moving the word to the first position in the list
    address = ' '.join(address)
    

    任何帮助将不胜感激。提前谢谢大家。

2 个答案:

答案 0 :(得分:1)

将字符串address[location]包装在括号中,然后连接其他部分,将字符串address = [address[location]] + address[:location] + address[location+1:] 放入列表中。

address = ['park', 'avenue', '1']
location = 2
address = [address[location]] + address[:location] + address[location+1:]

print(' '.join(address)) # => '1 park avenue'

一个例子:

compile 'com.android.support:design:24.1.1'

答案 1 :(得分:1)

这是您的代码的修改版本。它使用简单的列表切片来重新排列地址列表的各个部分。

不是使用for循环来搜索匹配的道路类型,而是使用集合操作。

这段代码并不完美:它不会捕获&#34;数字&#34;像12a一样,它不会像#34; Avenue Road&#34;那样处理奇怪的街道名称。

road_patterns = {'avenue', 'street', 'road', 'lane'}

def fix_address(address):
    address_list = address.split()
    road = road_patterns.intersection(address_list)
    if len(road) == 0:
        print("Can't find a road pattern in ", address_list)
    elif len(road) > 1:
        print("Ambiguous road pattern in ", address_list, road)
    else:
        road = road.pop()
        index = address_list.index(road) + 1
        if index < len(address_list):
            number = address_list[index]
            if number.isdigit() and len(number) <= 4:
                address_list = [number] + address_list[:index] + address_list[index + 1:]
                address = ' '.join(address_list)
    return address

addresses = (
    '42 tobacco road',
    'park avenue 1 a',
    'penny lane 17',
    'nonum road 12345',
    'strange street 23 london',
    'baker street 221b',
    '37 gasoline alley',
    '83 avenue road',
)

for address in addresses:
    fixed = fix_address(address)
    print('{!r} -> {!r}'.format(address, fixed))

<强>输出

'42 tobacco road' -> '42 tobacco road'
'park avenue 1 a' -> '1 park avenue a'
'penny lane 17' -> '17 penny lane'
'nonum road 12345' -> 'nonum road 12345'
'strange street 23 london' -> '23 strange street london'
'baker street 221b' -> 'baker street 221b'
Can't find a road pattern in  ['37', 'gasoline', 'alley']
'37 gasoline alley' -> '37 gasoline alley'
Ambiguous road pattern in  ['83', 'avenue', 'road'] {'avenue', 'road'}
'83 avenue road' -> '83 avenue road'