在这里我想通过插入数字并单击提交按钮以使用python beautifulsoup来删除数据来提交数据来删除数据

时间:2019-01-24 14:28:36

标签: python-3.x beautifulsoup

下面是我的代码。由于提交注册号后链接未更改。请有人帮我破解这段代码。

from bs4 import BeautifulSoup

import requests

r = requests.get('http://www.mpmedicalcouncil.net/smr_database.html')

soup = BeautifulSoup(r.text,'lxml')

links = soup.find('input',{"id":"FormsEditRegNo"})

2 个答案:

答案 0 :(得分:0)

使用硒发送输入,然后单击提交

from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get('http://www.mpmedicalcouncil.net/smr_database.html')
driver.find_element_by_id("FormsEditRegNo").send_keys("123456789")

driver.find_element_by_css_selector('input[value="Submit"]').click()

然后您可以使用Pandas,Selenium或BeautifulSoup来解析表:

soup = BeautifulSoup(driver.page_source,'lxml')

答案 1 :(得分:0)

我认为硒对这个网站可能是一个过大的杀伤力。您可以使用页面制作的requests to mimic the post request(您可以在大多数浏览器中通过检查工具上的“网络”标签找到相应的网址和参数)。您可以在此处查看表单数据。 enter image description here

这将给您答复文本。然后,您可以使用BeautifulSoup或pandas(或两者)来获取所需的数据。

import requests
from bs4 import BeautifulSoup
import pandas as pd
reg_no=12345
payload={
'Name':'',
'RegNo':reg_no,
'FormsButton1': 'Submit'
}
r=requests.post('http://www.mpmedicalcouncil.net/smr_database_search_result.asp',data=payload)
soup=BeautifulSoup(r.text,'html.parser')
data_table=soup.find('table',attrs={'width':590})
print(pd.read_html(str(data_table))[0])

输出

         0                               1                         2      ...                      5              6             7
0  Sr. No.                         Council                      Name      ...       Registration No.  Qualification  Year of Reg.
1        1  Madhya Pradesh Medical Council             Gawali Deepak      ...                  12345           MBBS          2011
2        2          Medical Council Bhopal  Maheshwari, Shyam Sunder      ...                  12345           MBBS          1993

[3 rows x 8 columns]

编辑:多个注册号并在列表列表中输出

import requests
from bs4 import BeautifulSoup
import pandas as pd
reg_no_list=[12345,2345]
all_results=[]
for reg_no in reg_no_list:
    payload={
    'Name':'',
    'RegNo':reg_no,
    'FormsButton1': 'Submit'
    }
    r=requests.post('http://www.mpmedicalcouncil.net/smr_database_search_result.asp',data=payload)
    soup=BeautifulSoup(r.text,'html.parser')
    data_table=soup.find('table',attrs={'width':590})
    df=pd.read_html(str(data_table))[0]
    df_to_list= df.drop(df.columns[0], axis=1).values.tolist()#sr..no column not required
    all_results.extend(df_to_list[1:])#[1:] b'coz first item in list will be headers
for l in all_results:
    print(l)

输出

['Madhya Pradesh Medical Council', 'Gawali Deepak', 'Mr. Gajendra Singh Gawali', '19- New Five Brigade, Complex, Agar Road, Ujjain (MP) 456007', '12345', 'MBBS', '2011']
['Medical Council Bhopal', 'Maheshwari, Shyam Sunder', 'R.C. Maheshwari', 'Cement Road, At Post. Piparia DIt. Hoshangabad', '12345', 'MBBS', '1993']
['Mahakoshal Medical Council, Indore', 'Aterkar Ku Sunanda', 'Puushottam Rao Aterkar', 'Near Y Sadashiv Studio Jayendra Ganj Gwalior', '2345', 'MBBS', '1969']
['Medical Council Bhopal', 'Deljeet Singh Kindra', 'Shri jogendra Singh Kindra', 'C/o Shri J.S. Kindra Near Jhansigate, Bina DIt. Sagar', '2345', 'MBBS', '1979']
['Madhya Pradesh Medical Council', 'Gupta Shyam Sunder', 'Shri Ramesh Chand', 'C/o Parmand Medical Store Sheopur Kalan 476337', '2345', 'MBBS', '1999']