我正在尝试从网站上抓取一个列表,但我要分别提取的每个经销店都没有标签。有什么办法可以拉动它们,使它们单独拉动而不是作为列表拉动?
这是我要从中获取的网站:
答案 0 :(得分:1)
import requests
from bs4 import BeautifulSoup
url = 'http://www.autodealerdirectory.us/ca_s_madd.html'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
dealers = []
for tag in soup.select('#bodyText hr')[1:]:
s = ''
s += tag.next_sibling
s += tag.next_sibling.next_sibling.next_sibling
s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
s += tag.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling.next_sibling
dealers.append(s)
for dealer in dealers:
print(dealer.strip())
print('-----------------------------------------')
这将完成工作。每个经销商的信息都在列表dealers
中。您只需要清理字符串