通过网络抓取,使用帖子从网站获取结果

时间:2018-10-26 19:52:30

标签: python web-scraping beautifulsoup python-requests

这是我要从中获取数据的网站的链接 Puplic Search of Trademarks

为此,我需要填写一个表格,但是我想使用Python requests库来填写该表格。我已经为此编写了一些代码,看看:

from bs4 import BeautifulSoup
import requests,json

def returnJson(wordmark,page_class):
    url="http://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx"
    search_type='WM'
    postdata={'ctl00$ContentPlaceHolder1$DDLFilter':'0','ctl00$ContentPlaceHolder1$DDLSearchType':search_type,'ctl00$ContentPlaceHolder1$TBWordmark':wordmark,'ctl00$ContentPlaceHolder1$TBClass':page_class}
    r=requests.post(url,data=postdata)
    return r

def scrapping(r):
    soup=BeautifulSoup(r.text,'html.parser')
    print(soup.prettify())
    '''soup.find_all('p')'''

scrapping(returnJson('AIWA','2'))    

但是,当我运行此代码时,它会返回与页面相同的HTML作为响应,但是我需要搜索结果,以便可以在终端上打印它。

注意:-我检查了它发送的邮寄请求,并根据该文件将邮递数据设为Dictionay。

here is screenshot of file

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

该帖子需要更多其他值才能正常工作。可以通过先请求页面而不进行搜索来获得这些信息(如果您进行多次搜索,可能只需要一次)。例如:

from bs4 import BeautifulSoup
import requests,json

def returnJson(wordmark, page_class):
    url = "http://ipindiaonline.gov.in/tmrpublicsearch/frmmain.aspx"

    r_init = requests.get(url)
    soup = BeautifulSoup(r_init.text, 'html.parser')
    event_validation = soup.find("input", attrs={"name" : "__EVENTVALIDATION"})['value']
    view_state = soup.find("input", attrs={"name" : "__VIEWSTATE"})['value']

    search_type = 'WM'

    postdata = {
        'ctl00$ContentPlaceHolder1$DDLFilter' : '0',
        'ctl00$ContentPlaceHolder1$DDLSearchType' : search_type,
        'ctl00$ContentPlaceHolder1$TBWordmark' : wordmark,
        'ctl00$ContentPlaceHolder1$TBClass' : page_class,
        '__EVENTVALIDATION' : event_validation,
        "__EVENTTARGET" : "ctl00$ContentPlaceHolder1$BtnSearch",
        "__VIEWSTATE" : view_state,
    }

    r = requests.post(url, data=postdata)
    return r

def scrapping(r):
    soup = BeautifulSoup(r.text, 'html.parser')

    print(soup.prettify())
    '''soup.find_all('p')'''

scrapping(returnJson('AIWA','2'))