我要发送发帖请求的网址是http://www.hkexnews.hk/sdw/search/searchsdw.aspx
我要(手动)进行的搜索只需在“股票代码”中输入“ 1”,然后单击“搜索”
我已经尝试了很多次使用python和chrome扩展名“ Postman”发送带有以下标头的发布请求:
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 1844
Content-Type: application/x-www-form-urlencoded
Cookie: TS0161f2e5=017038eb490da17e158ec558c902f520903c36fad91e96a3b9ca79b098f2d191e3cac56652
Host: www.hkexnews.hk
Origin: http://www.hkexnews.hk
Referer: http://www.hkexnews.hk/sdw/search/searchsdw.aspx
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36
和以下参数:
today: 20180624
sortBy:
selPartID:
alertMsg:
ddlShareholdingDay: 23
ddlShareholdingMonth: 06
ddlShareholdingYear: 2018
txtStockCode: 00001
txtStockName:
txtParticipantID:
txtParticipantName:
btnSearch.x: 35
btnSearch.y: 8
但它不起作用。
答案 0 :(得分:2)
尝试以下方式。它应该获取您所需的响应以及该站点中根据搜索条件生成的表格数据。
import requests
from bs4 import BeautifulSoup
URL = "http://www.hkexnews.hk/sdw/search/searchsdw.aspx"
with requests.Session() as s:
s.headers={"User-Agent":"Mozilla/5.0"}
res = s.get(URL)
soup = BeautifulSoup(res.text,"lxml")
payload = {item['name']:item.get('value','') for item in soup.select("input[name]")}
payload['__EVENTTARGET'] = 'btnSearch'
payload['txtStockCode'] = '00001'
payload['txtParticipantID'] = 'A00001'
req = s.post(URL,data=payload,headers={"User-Agent":"Mozilla/5.0"})
soup_obj = BeautifulSoup(req.text,"lxml")
for items in soup_obj.select("#pnlResultSummary .ccass-search-datarow"):
data = [item.get_text(strip=True) for item in items.select("div")]
print(data)
答案 1 :(得分:0)
如果新闻站点提供了搜索API,并且您具有访问权限,则可以使用诸如Postman之类的工具来获取搜索结果。否则,您将抓取结果。
您提到的用例是典型的抓取。查看是否有搜索API,如果没有,请使用selenium
之类的内容来抓取结果。