刮痧与美丽的汤和Python KeyError:' href'

时间:2017-03-22 09:56:24

标签: python web-scraping beautifulsoup screen-scraping

我得到了KeyError: 'href'。我收集这是因为我的属性没有定义,我试图找到一个解决方案,但到目前为止一直没有成功。我的代码如下:

import requests
from bs4 import BeautifulSoup

main_url = "https://www.chapter-living.com/properties/highbury/"
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('h2', class_="title")  # The section containing the links to the cities
cities_links = [main_url + tag['href'] for tag in city_tags]  # Iterates through city_tags and stores them in a [list]

调用cities_links

时出错

2 个答案:

答案 0 :(得分:1)

import requests
from bs4 import BeautifulSoup

main_url = "http://www.chapter-living.com/properties/highbury"
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('h2', class_="title")
cities_links = [main_url + tag.find('a').get('href','') if tag.find('a') else '' for tag in city_tags]
print cities_links

这将导致:

[u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-en-suite/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/silver-en-suite/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-studio/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-premium-studio/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/silver-studio/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/gold-studio/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/platinum-studio/', u'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/two-bed-flat/', '', '', '', '', '', '']

或者,您可以使用 lxml 模块,该模块比 BeautifulSoup 快一个数量级:

import requests
from lxml import html

main_url = "http://www.chapter-living.com/properties/highbury"
re = requests.get(main_url)
root = html.fromstring(re.content)
cities_links = [main_url + link for link in root.xpath('//h2[@class="title"]/a/@href')]
print cities_links

这将导致:

['http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-en-suite/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/silver-en-suite/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-studio/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/bronze-premium-studio/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/silver-studio/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/gold-studio/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/platinum-studio/', 'http://www.chapter-living.com/properties/highbury/properties/highbury/rooms/two-bed-flat/']

答案 1 :(得分:0)

h2个标签没有href属性。这属于a个标签。这就是您收到此错误的原因,您尝试访问不存在的属性。