从提供者列表中的各个结果中收集数据

时间:2019-05-15 07:40:54

标签: python beautifulsoup python-requests

我正在尝试通过给定的邮政编码获取所有治疗师的地址。我想输入一个邮政编码并获取结果列表。然后,进入单个结果并刮擦提供者的地址。

我是python的新手。我一直在尝试使用请求和BeautifulSoup。也许使用硒可能更好?

declaration: true

我现在被困住了。不知道如何进行。 PS。我正在讲的是python课程。请客气。

1 个答案:

答案 0 :(得分:1)

尝试一下,您将通过给定的邮政编码获得所有治疗师的地址:

但是,如果您要获取该地址的所有页面,则此列表仅提供1页编号的地址列表,那么您应该使用硒,这样可以解决您的问题。

import requests
from bs4 import BeautifulSoup
from bs4.element import Tag

url = 'https://www.psychologytoday.com/us/therapists/60148'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
result = soup.find(class_='results-column')

addressArray = []

for tag in result:

    if isinstance(tag,Tag):
        _class = tag.get("class")

        if _class is None or _class is not None and "row" not in _class:
            continue

        link = (tag.find(class_='result-actions')).find('a',href=True)

        _href = link['href']

        address_link = requests.get(_href, headers=headers)
        soup1 = BeautifulSoup(address_link.text, 'html.parser')

        address = (soup1.find(class_='address')).find(class_="location-address-phone")

        text = ''
        for index,data in enumerate((address.text.strip()).split('\n')):
            if not data.strip():
                continue

            if not text:
                text = data.strip()
            else:
                text = text+","+data.strip()

        if text:
            addressArray.append(text)

print(addressArray)

O / P:

['Lia Reynolds, LCSW,Lombard, Illinois 60148,(630) 343-5819', 'Clarity Counseling and Wellness, LLC,477 Butterfield Road,#202,Lombard, Illinois 60148,(630) 656-9713', '450 East 22nd St.,Suite 172,Lombard, Illinois 60148,(773) 599-3959', '10 E 22nd Street,Suite 217,Lombard, Illinois 60148,(630) 517-9505', 'Ron Ahlberg & Associates,477 E Butterfield Rd,Suite 310,Lombard, Illinois 60148,(630) 451-8653', 'Health Transitions Counseling,477 Butterfield Road,Suite 310,Lombard, Illinois 60148,(630) 785-6642', 'Way Beyond Counseling and Coaching,477 E Butterfield Road,Floor 3 - Wellness Center - Office 7,Lombard, Illinois 60148,Call Mr. Larry Westenberg,(630) 556-8484', 'Chicago Area Behavioral Health Services,150 W St Charles Road,Lombard, Illinois 60148,Call Augustus Edeh. Chicago Area Behavioral Health Services,(630) 599-8032', 'Adult Children Center, Ltd,2 East 22nd Street,Suite 302,Lombard, Illinois 60148,(630) 387-9750', 'Midwest Center for Hope & Healing, Ltd.,1165 S Westmore-meyers Rd,Lombard, Illinois 60148,(630) 765-5355', 'Madrigal Consulting and Counseling, LLP,450 E. 22nd Street,Suite 150,Lombard, Illinois 60148,Call Cesar Madrigal,(630) 413-9942', '477 E Butterfield Rd,Suite 202,Lombard, Illinois 60148,(630) 560-6920', 'Lombard,Lombard, Illinois 60148,(630) 796-7904', 'Dupage Clinical Counseling Services,450 E 22nd St,150,Lombard, Illinois 60148,(630) 313-4990', '2200 S Main St,Suite 316,Lombard, Illinois 60148,(630) 426-7819', 'Institute for Motivational Development,10 E 22nd Street, Suite 217,Lombard, Illinois 60148,(309) 723-8170', 'Michele DeCanio Counseling Services,2200 S. Main Street,Suite 305,Lombard, Illinois 60148,(630) 560-6926', 'A New Day Counseling Center,450 E 22nd St,Suite 150,Lombard, Illinois 60148,(630) 748-8261', '477 E Butterfield Rd,Suite 310,Lombard, Illinois 60148,(630) 426-6878', 'Bricolage Wellness,477 Butterfield Road,Suite 202,Lombard, Illinois 60148,(630) 426-7823']

其中'result-actions'是用于打开新页面的操作视图按钮类,因此需要再次提出请求以获取完整地址。

"location-address-phone"是要替换地址的新地址页类别。

文档链接:

https://selenium-python.readthedocs.io/

https://www.crummy.com/software/BeautifulSoup/bs4/doc/