我正试图从https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do
获取NYC日托的数据我尝试过使用requests.get和requests.post,但是我们点击搜索按钮时,我们在网站上看到的数据没有日托表。
import requests
url=requests.get('https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do?
pager.offset=10')
if url.status_code==200:
response = url.text
print(url.text)
url2='https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do'
payload={'pager.offset':'30'}
r1=requests.post(url2, data=payload)
print(r1.text)
答案 0 :(得分:0)
这样做:
import requests
from bs4 import BeautifulSoup
URL = 'https://a816-healthpsi.nyc.gov/ChildCare/SearchAction2.do'
payload = 'linkPK=0&pageroffset=0&getNewResult=true&progTypeValues=&search=1&facilityName=&borough=&permitNo=&neighborhood=&building=&street=&zipCode='
with requests.Session() as s:
s.headers={"User-Agent":"Mozilla/5.0"}
s.headers.update({'Content-Type': 'application/x-www-form-urlencoded'})
htmldoc = s.post(URL, data = payload)
soup = BeautifulSoup(htmldoc.content, "lxml")
table = soup.find("table", {'id': 'tech-companies'})
for items in table.find_all("tr"):
data = [' '.join(item.text.split()) for item in items.find_all("td")]
print(data)
部分输出:
['"THE STUDIO SCHOOL"', 'Child Care - Pre School', '117 WEST 95TH STREET', '10025', '212-678-2416', 'Permitted']
['1199 FUTURE OF AMERICA LEARNING CENTER', 'Camp', '2514 CRESTON AVENUE', '10468', '718-562-2915', 'In-Renewal']
['1199 S E I U / EMPLOYER CHILD CARE CORPORATION', 'Child Care - Infants/Toddlers', '2500 CRESTON AVENUE', '10468', '718-562-2915', 'Permitted']