如何使用python请求刮取所需的数据?

时间:2018-04-01 17:33:02

标签: python web-scraping python-requests

我正试图从https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do

获取NYC日托的数据

我尝试过使用requests.get和requests.post,但是我们点击搜索按钮时,我们在网站上看到的数据没有日托表。

使用requests.get

import requests
url=requests.get('https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do? 
pager.offset=10')
if url.status_code==200:

   response = url.text
   print(url.text)

使用requests.post

url2='https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do'
payload={'pager.offset':'30'}
r1=requests.post(url2, data=payload)
print(r1.text)

1 个答案:

答案 0 :(得分:0)

这样做:

import requests
from bs4 import BeautifulSoup

URL = 'https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do'
payload = 'linkPK=0&pageroffset=0&getNewResult=true&progTypeValues=&search=1&facilityName=&borough=&permitNo=&neighborhood=&building=&street=&zipCode='

with requests.Session() as s:
    s.headers={"User-Agent":"Mozilla/5.0"}
    s.headers.update({'Content-Type': 'application/x-www-form-urlencoded'})
    htmldoc = s.post(URL, data = payload)
    soup = BeautifulSoup(htmldoc.content, "lxml")
    table = soup.find("table", {'id': 'tech-companies'})
    for items in table.find_all("tr"):
        data = [' '.join(item.text.split()) for item in items.find_all("td")]
        print(data)

部分输出:

['"THE STUDIO SCHOOL"', 'Child Care - Pre School', '117 WEST 95TH STREET', '10025', '212-678-2416', 'Permitted']
['1199 FUTURE OF AMERICA LEARNING CENTER', 'Camp', '2514 CRESTON AVENUE', '10468', '718-562-2915', 'In-Renewal']
['1199 S E I U / EMPLOYER CHILD CARE CORPORATION', 'Child Care - Infants/Toddlers', '2500 CRESTON AVENUE', '10468', '718-562-2915', 'Permitted']