Python 3刮黄页

时间:2016-12-31 03:32:44

标签: python web-scraping beautifulsoup

我试图从黄页中删除数据,但我遇到了无法获取每个商家名称和地址/电话的文本。我正在使用下面的代码,我哪里错了?我正在尝试打印每个业务的文本,但只是为了在我测试时立即将其打印出来,但是一旦我完成,那么我将把数据保存到csv。

import csv
import requests
from bs4 import BeautifulSoup

#dont worry about opening this file
"""with open('cities_louisiana.csv','r') as cities:
    lines = cities.read().splitlines()
cities.close()"""

for city in lines:
    print(city)
url = "http://www.yellowpages.com/search? search_terms=businesses&geo_location_terms=amite+LA&page="+str(count)

for city in lines:
    for x in range (0, 50):
        print("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=amite+LA&page="+str(x))
        page = requests.get("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=amite+LA&page="+str(x))
        soup = BeautifulSoup(page.text, "html.parser")
        name = soup.find_all("div", {"class": "v-card"})
        for name in name:
            try:
                print(name.contents[0]).find_all(class_="business-name").text
                #print(name.contents[1].text)
            except:
                pass

1 个答案:

答案 0 :(得分:4)

您应该遍历搜索结果,然后,对于每个搜索结果,找到商家名称(带有"商家名称"类的元素)和地址(带有" adr&的元素) #34; class):

for result in soup.select(".search-results .result"):
    name = result.select_one(".business-name").get_text(strip=True, separator=" ")
    address = result.select_one(".adr").get_text(strip=True, separator=" ")

    print(name, address)

.select().select_one()非常方便CSS selector methods