在Python中解析和迭代URL列表

时间:2018-02-18 06:43:04

标签: python

website_list = [
    'https://www.zillow.com/62347390?location=Chicago%2N%23253',
    'https://www.zillow.com/82983250?location=Boston%3B%53324',
    'https://www.zillow.com/12917837?location=Miami%7K%26345',
]

如何创建python函数(例如city_finder()),以便在给定website_list作为输入时获得以下输出?

>>> city_finder(website_list)
['Chicago', 'Boston', 'Miami']

4 个答案:

答案 0 :(得分:2)

之前的答案假设网址格式不会改变。使用正则表达式不会考虑意外的URL表单。

要处理网址格式的更改,请使用urllib.parse模块,其文档为here

即,使用urlparse()函数,该函数可以将URL解析为其组件。您想要的组件是“查询组件”,它由urlparse()作为字典公开。与location键关联的值将是包含例如'Chicago%2N%23253'的列表。最后,在第一个%之前提取子字符串。

以下是代码段:

from urllib.parse import urlparse, parse_qs

def city_finder(links)
    cities = []
    for url in links:
        query = parse_qs(urlparse(url).query)
        cities.append(query['location'][0].split('%')[0])
    return cities

答案 1 :(得分:0)

您可以使用str.find()查找“location =”的索引位置以及城市名称后面的“%”索引位置。使用list compehension循环遍历url列表:

def city_finder(website_list)
    return [site[site.find("location=")+9:site.find("%")] for site in website_list]

答案 2 :(得分:0)

使用re模块在​​location=中的每个项目中查找website_list后面的字词。使用append将检索到的位置添加到city列表和return

import re
website_list = ['https://www.zillow.com/62347390?location=Chicago%2N%23253', 'https://www.zillow.com/82983250?location=Boston%3B%53324', 'https://www.zillow.com/12917837?location=Miami%7K%26345']
regexp = re.compile("location=(.*)%")
city = []
def city_finder(website_list):
    for lists in website_list:
        city.append((regexp.search(lists).group(1).split('%')[0]))
    return(city)
print city_finder(website_list)

输出:

['Chicago', 'Boston', 'Miami']

答案 3 :(得分:0)

根据我的评论,您可以使用

import re

website_list = [
    'https://www.zillow.com/62347390?location=Chicago%2N%23253',
    'https://www.zillow.com/82983250?location=Boston%3B%53324',
    'https://www.zillow.com/12917837?location=Miami%7K%26345',
]

def city_finder(lst=None):
    rx = re.compile(r'location=([^%]+)')
    return [city.group(1) 
            for item in lst 
            for city in [rx.search(item)]
            if city]

print(city_finder(website_list))

哪个收益

['Chicago', 'Boston', 'Miami']