如何使用BeautifulSoup刮擦网站

时间:2018-09-29 18:31:26

标签: python web-scraping beautifulsoup

我正在尝试从网站上抓取一个列表,但我要分别提取的每个经销店都没有标签。有什么办法可以拉动它们,使它们单独拉动而不是作为列表拉动?

这是我要从中获取的网站:

http://www.autodealerdirectory.us/ca_s_madd.html

1 个答案:

答案 0 :(得分:1)

import requests
from bs4 import BeautifulSoup

url = 'http://www.autodealerdirectory.us/ca_s_madd.html'

r = requests.get(url)

soup = BeautifulSoup(r.text, 'lxml')

dealers = []

for tag in soup.select('#bodyText hr')[1:]:
    s = ''
    s += tag.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
    s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
    dealers.append(s)

for dealer in dealers:
    print(dealer.strip())
    print('-----------------------------------------')

这将完成工作。每个经销商的信息都在列表dealers中。您只需要清理字符串