Question

我正在整理一个网络爬虫，以根据邮政编码列表收集Goodwill商店的位置数据。我过去在其他商店多次这样做，但是Goodwill的网站似乎有所不同。这就是我要抓取的div已设置。

<div class="contact">4300 W 36 1/2 St<br>St Louis Park, MN 55416<br><div 
 class="phone">(952) 922-9640</div><a onclick="ga('send', 'event', 
 'Locator', 'Clicked Location Website Link', 'http://www.seconddebut.org');" 
 class="website" href="http://www.seconddebut.org">Visit Website</a></div>

我要从这个div刮取街道地址，城市，州和邮政编码。我已经尝试过此代码

htmlSource = driver.page_source
soup = BeautifulSoup(htmlSource, 'html.parser')
stores = soup.find("div", attrs={"class":"contact"})
for store in stores:
    print store.get_text()

我也尝试过

soup = BeautifulSoup(htmlSource, 'html.parser')
stores = soup.find("div", attrs={"class":"contact"})
children = stores.findChildren("br", recursive=False)
for child in children:
    print child

这两个选项对我来说都不起作用。任何帮助将不胜感激！

Answer 1

尝试硒

 webrdriver.find_element_by_ccs_selector('selector path').text

Answer 2

假设它们都遵循相同的模式，则类似以下的内容应该起作用：

from bs4 import BeautifulSoup

markup = r"""
<div class="contact">4300 W 36 1/2 St<br>St Louis Park, MN 55416<br><div 
 class="phone">(952) 922-9640</div><a onclick="ga('send', 'event', 
 'Locator', 'Clicked Location Website Link', 'http://www.seconddebut.org');" 
 class="website" href="http://www.seconddebut.org">Visit Website</a></div>
"""

soup = BeautifulSoup(markup, "html.parser")

store = soup.find("div", attrs={"class": "contact"})
print(list(store.strings)[:2])

结果：

['4300 W 36 1/2 St', 'St Louis Park, MN 55416']

Div文本未与Selenium Python一起显示

2 个答案: