我花了最近几个月开发了一个程序,我的公司正在使用这个程序来大规模地清理和地理编码地址(约5,000 /天)。它运行得很好,但是,我每天看到的某些地址格式对我来说是个问题。
使用此park avenue 1
等格式的地址会导致我的地理编码问题。我解决这个问题的思路如下:
avenue, street, road, etc
之类的单词。我有一个名为patterns
的这些分隔符的列表。这是我最初尝试将我的想法转化为代码:
patterns ['my list of delimiters']
address = 'park avenue 1' # this is an example address
address = address.split(' ')
for pattern in patterns:
location = address.index(pattern) + 1
if address[location].isdigit() and len(address[location]) <= 4:
# here is where i'm getting a bit confused
# what would be a good way to go about moving the word to the first position in the list
address = ' '.join(address)
任何帮助将不胜感激。提前谢谢大家。
答案 0 :(得分:1)
将字符串address[location]
包装在括号中,然后连接其他部分,将字符串address = [address[location]] + address[:location] + address[location+1:]
放入列表中。
address = ['park', 'avenue', '1']
location = 2
address = [address[location]] + address[:location] + address[location+1:]
print(' '.join(address)) # => '1 park avenue'
一个例子:
compile 'com.android.support:design:24.1.1'
答案 1 :(得分:1)
这是您的代码的修改版本。它使用简单的列表切片来重新排列地址列表的各个部分。
不是使用for
循环来搜索匹配的道路类型,而是使用集合操作。
这段代码并不完美:它不会捕获&#34;数字&#34;像12a一样,它不会像#34; Avenue Road&#34;那样处理奇怪的街道名称。
road_patterns = {'avenue', 'street', 'road', 'lane'}
def fix_address(address):
address_list = address.split()
road = road_patterns.intersection(address_list)
if len(road) == 0:
print("Can't find a road pattern in ", address_list)
elif len(road) > 1:
print("Ambiguous road pattern in ", address_list, road)
else:
road = road.pop()
index = address_list.index(road) + 1
if index < len(address_list):
number = address_list[index]
if number.isdigit() and len(number) <= 4:
address_list = [number] + address_list[:index] + address_list[index + 1:]
address = ' '.join(address_list)
return address
addresses = (
'42 tobacco road',
'park avenue 1 a',
'penny lane 17',
'nonum road 12345',
'strange street 23 london',
'baker street 221b',
'37 gasoline alley',
'83 avenue road',
)
for address in addresses:
fixed = fix_address(address)
print('{!r} -> {!r}'.format(address, fixed))
<强>输出强>
'42 tobacco road' -> '42 tobacco road'
'park avenue 1 a' -> '1 park avenue a'
'penny lane 17' -> '17 penny lane'
'nonum road 12345' -> 'nonum road 12345'
'strange street 23 london' -> '23 strange street london'
'baker street 221b' -> 'baker street 221b'
Can't find a road pattern in ['37', 'gasoline', 'alley']
'37 gasoline alley' -> '37 gasoline alley'
Ambiguous road pattern in ['83', 'avenue', 'road'] {'avenue', 'road'}
'83 avenue road' -> '83 avenue road'