我想从以下网站抓取结果名称:RAJASTHAN TECHNICAL UNIVERSITY, KOTA
以下是我当前在Python中的代码:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
print(soup)
当我尝试打印汤时出现以下错误:
Couldn't access the inner elements of frame name="mainFrame" src="mainpage.php"/>
答案 0 :(得分:0)
有一个iframe要处理。您可以改用iframe src作为起点,并为与结果页面匹配的POST请求收集所需的参数
import requests
from bs4 import BeautifulSoup as bs
with requests.Session() as s:
s.headers = {'Referer': 'http://www.esuvidha.info/rtURes/ResMainpage.php'}
r = s.get('http://www.esuvidha.info/mainpage.php')
soup= bs(r.content, 'lxml')
data = {i['id']:i['value'] for i in soup.select('[type="hidden"][id]')}
r = s.post('http://www.esuvidha.info/rtURes/ResMainpage.php', data=data)
soup = bs(r.content, 'lxml')
print(soup.select_one('fieldset > font').text)