我使用漂亮的汤来查找和解析页面上的街道地址。 最后,我想将街道地址写入excel文档。
以下是我尝试解析的页面:https://montreal.lufa.com/en/pick-up-points
有问题的页面在类下面的同一级别列出了div元素。我一直无法解析各行。相反,我的代码只是吐出了类中的所有内容。
到目前为止我的代码:
from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib2
URL = "https://montreal.lufa.com/en/pick-up-points"
html = urllib2.urlopen(URL).read().decode('UTF-8')
soup = BeautifulSoup(html, "html5lib")
business = (soup.find('div', class_="info"))
print (business)
非常感谢任何帮助!
答案 0 :(得分:1)
我会执行以下操作:对于每个商家,找到开放日并获取every previous sibling:
for business in soup.find_all('div', class_="info"):
days = business.find("div", class_="days")
print(" ".join(sibling.get_text(strip=True)
for sibling in reversed(days.find_previous_siblings())))
打印:
1600, René-Lévesque west 1600, René-Lévesque west Montreal, Quebec H3H 1P9
555 Chabanel Street West 555 Chabanel Street West Montreal, Quebec H2N 2H8
À la Boîte à Fleurs 3266 Saint-Rose Boulevard Laval, Quebec H7P 4K8
Allez Up Centre d'escalade 1555 St-Patrick Montreal, Quebec H3K 2B7
...
YMCA Cartierville 11885 Laurentien Boulevard Montreal, Quebec H4J 2R5
Zone, Real estate Agency 200 rue St-Jean Longueuil, Quebec J4H 2X5
答案 1 :(得分:1)
酷,alecxe!这就是我在机器上工作的原因。 。 。
#1) In Console:
pip install lxml
#2) Run script below:
from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib2
URL = "https://montreal.lufa.com/en/pick-up-points"
html = urllib2.urlopen(URL).read().decode('UTF-8')
soup = BeautifulSoup(html, "lxml")
#business = (soup.find('div', class_="info"))
for business in soup.find_all('div', class_="info"):
days = business.find("div", class_="days")
print(" ".join(sibling.get_text(strip=True)
for sibling in reversed(days.find_previous_siblings())))
print (business)