漂亮的汤在表单提交后获取html表单数据

时间:2020-03-03 09:12:01

标签: html beautifulsoup

我有一个链接,该链接是html表单提交的结果:

https://www.taxpayerservicecenter.com/RP_Detail.jsp?ssl=4204%20%20%20%200084

这是公共税收记录。我想获取返回表中的所有数据: enter image description here

在浏览器中使用检查,我认为表数据位于这样的元素中:

<form action="./RP_Results.jsp" id="SearchForm" method="post" name="SearchForm" onsubmit="return validateForm(document.SearchForm)">

当我使用漂亮的汤时,我似乎无法访问此td类。我看到了:

from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.taxpayerservicecenter.com/RP_Detail.jsp?ssl=4204%20%20%20%200084")
page

soup = BeautifulSoup(page.content,'lxml')

soup

有人知道如何获取此表数据吗?这就是我尝试过的。

The query returns some columns [mContactId, mAddress, mPostcode, mCity, mCountry, mAddressType]
 which are not used by org.linphone.contacts.managementWS.ContactWithAddresses. You can use 
@ColumnInfo annotation on the fields to specify the mapping. 
org.linphone.contacts.managementWS.ContactWithAddresses has some fields [mName, mSurname, 
mFullName, mCompany, mNote, mIsBlocked] which are not returned by the query. If they are not 
supposed to be read from the result, you can mark them with @Ignore annotation. You can suppress 
this warning by annotating the method with @SuppressWarnings(RoomWarnings.CURSOR_MISMATCH). 
Columns returned by the query: id, mContactId, mAddress, mPostcode, mCity, mCountry, 
mAddressType. Fields in org.linphone.contacts.managementWS.ContactWithAddresses: id, mName, 
mSurname, mFullName, mCompany, mNote, mIsBlocked.

1 个答案:

答案 0 :(得分:1)

您需要在get请求中设置JSESSIONID Cookie标头,才能“查看”表

如下修改您的获取请求

page = requests.get(url, headers={
    'Cookie': 'JSESSIONID=11qfsCuAhlev3j943gEn8bf-CBfH8Ta_z858JNR9w__7PJOfxkWr!-965451614'
})

注意:您可以使用“网络”标签中的Chrome / Firefox开发工具获取JSESSIONID,然后单击第一个请求