下面是我的代码。由于提交注册号后链接未更改。请有人帮我破解这段代码。
from bs4 import BeautifulSoup
import requests
r = requests.get('http://www.mpmedicalcouncil.net/smr_database.html')
soup = BeautifulSoup(r.text,'lxml')
links = soup.find('input',{"id":"FormsEditRegNo"})
答案 0 :(得分:0)
使用硒发送输入,然后单击提交
from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get('http://www.mpmedicalcouncil.net/smr_database.html')
driver.find_element_by_id("FormsEditRegNo").send_keys("123456789")
driver.find_element_by_css_selector('input[value="Submit"]').click()
然后您可以使用Pandas,Selenium或BeautifulSoup来解析表:
soup = BeautifulSoup(driver.page_source,'lxml')
答案 1 :(得分:0)
我认为硒对这个网站可能是一个过大的杀伤力。您可以使用页面制作的requests to mimic the post request(您可以在大多数浏览器中通过检查工具上的“网络”标签找到相应的网址和参数)。您可以在此处查看表单数据。
这将给您答复文本。然后,您可以使用BeautifulSoup或pandas(或两者)来获取所需的数据。
import requests
from bs4 import BeautifulSoup
import pandas as pd
reg_no=12345
payload={
'Name':'',
'RegNo':reg_no,
'FormsButton1': 'Submit'
}
r=requests.post('http://www.mpmedicalcouncil.net/smr_database_search_result.asp',data=payload)
soup=BeautifulSoup(r.text,'html.parser')
data_table=soup.find('table',attrs={'width':590})
print(pd.read_html(str(data_table))[0])
输出
0 1 2 ... 5 6 7
0 Sr. No. Council Name ... Registration No. Qualification Year of Reg.
1 1 Madhya Pradesh Medical Council Gawali Deepak ... 12345 MBBS 2011
2 2 Medical Council Bhopal Maheshwari, Shyam Sunder ... 12345 MBBS 1993
[3 rows x 8 columns]
编辑:多个注册号并在列表列表中输出
import requests
from bs4 import BeautifulSoup
import pandas as pd
reg_no_list=[12345,2345]
all_results=[]
for reg_no in reg_no_list:
payload={
'Name':'',
'RegNo':reg_no,
'FormsButton1': 'Submit'
}
r=requests.post('http://www.mpmedicalcouncil.net/smr_database_search_result.asp',data=payload)
soup=BeautifulSoup(r.text,'html.parser')
data_table=soup.find('table',attrs={'width':590})
df=pd.read_html(str(data_table))[0]
df_to_list= df.drop(df.columns[0], axis=1).values.tolist()#sr..no column not required
all_results.extend(df_to_list[1:])#[1:] b'coz first item in list will be headers
for l in all_results:
print(l)
输出
['Madhya Pradesh Medical Council', 'Gawali Deepak', 'Mr. Gajendra Singh Gawali', '19- New Five Brigade, Complex, Agar Road, Ujjain (MP) 456007', '12345', 'MBBS', '2011']
['Medical Council Bhopal', 'Maheshwari, Shyam Sunder', 'R.C. Maheshwari', 'Cement Road, At Post. Piparia DIt. Hoshangabad', '12345', 'MBBS', '1993']
['Mahakoshal Medical Council, Indore', 'Aterkar Ku Sunanda', 'Puushottam Rao Aterkar', 'Near Y Sadashiv Studio Jayendra Ganj Gwalior', '2345', 'MBBS', '1969']
['Medical Council Bhopal', 'Deljeet Singh Kindra', 'Shri jogendra Singh Kindra', 'C/o Shri J.S. Kindra Near Jhansigate, Bina DIt. Sagar', '2345', 'MBBS', '1979']
['Madhya Pradesh Medical Council', 'Gupta Shyam Sunder', 'Shri Ramesh Chand', 'C/o Parmand Medical Store Sheopur Kalan 476337', '2345', 'MBBS', '1999']