我正试图从网站上读取表格。
import requests
import pandas as pd
from bs4 import BeautifulSoup
#import html5lib
import csv
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
'Cookie':'JSESSIONID=0000n4LQNXAwac4pFODK2Sbeuzm:1aka1kgm6',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7',
'Orgin':'https://structurednotes-announce.tdcc.com.tw',
'Referer':'https://structurednotes-announce.tdcc.com.tw/Snoteanc/apps/bas/BAS210.jsp',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Length':'277',
'Content-Type':'application/x-www-form-urlencoded',
'Host':'structurednotes-announce.tdcc.com.tw',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mod':'navigate',
'Sec-Fetch-Site':'same-origin',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1'
}
#Query website
url= "https://structurednotes-announce.tdcc.com.tw/Snoteanc/apps/bas/BAS210.jsp"
attr= {'SALE_ORG_UUID': '-5afeb113:144566e3757:-6d01', 'agentDateStart':'2020/01/01', 'agentDateEnd': '2020/08/30', 'currentPage':'1','Last_Order_By':'FUND_NAME'}
response=requests.post(url,data=attr,headers = headers)
html_content=response.text
print(response.status_code)
soup = BeautifulSoup(html_content, "lxml")
print(soup.prettify())
从上面的代码中,我可以获得响应代码200,但结果是没有提交表单。 我在F12中进行了查询,发现我还必须提交的一个属性是Action:Q。
SALE_ORG_UUID: -5afeb113:144566e3757:-6d01 <br>
agentDateStart: 2020/01/01 <br>
agentDateEnd: 2020/09/28 <br>
**action: Q** <br>
LAST_ORDER_BY: FUND_NAME <br>
currentPage: 1 <br>
但是,如果这样做,我会得到500个响应代码,而不是说系统错误。你们能帮我指出我可能会出问题的地方吗?我想要的只是提交请求并读回文档。非常感谢。