我正在发送POST请求以过滤ID,然后解析输出。我需要在colspan选择器下面获取所有内容(最后显示了所需的输出)。在colspan="4"
下有许多b
,table
标签,它包含tbody
> tr
> td
,但是我的脚本仅返回{{1 }}标签的内容。
URL:https://e-mehkeme.gov.az/Public/Cases
b
期望的输出:
import requests
from bs4 import BeautifulSoup as bs
request_headers = {
'authority': 'e-mehkeme.gov.az',
'method': 'POST',
'path': '/Public/Cases',
'scheme': 'https',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
'application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en,en-GB;q=0.9',
'cache-control': 'max-age=0',
'content-length': '66',
'content-type': 'application/x-www-form-urlencoded',
'origin': 'https://e-mehkeme.gov.az',
'referer': 'https://e-mehkeme.gov.az/Public/Cases',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/75.0.3770.142 Safari/537.36',
}
voens = {'1303450301',
'1700393071',
'2002283071',
}
form_data = {
'CourtId': '',
'CaseNo': '',
'DocFin': '',
'DocSeries': '',
'DocNumber': '',
'VOEN': voens,
'button': 'Search',
}
url = 'https://e-mehkeme.gov.az/Public/Cases?courtid='
response = requests.post(url, data=form_data, headers=request_headers)
s = bs(response.content, 'lxml')
# PRINT THE HEADERS!
sHeader = s.findAll('tr', {'class': 'centeredheader'})[0]
headers = [sHeader.get_text().strip()]
print(headers)
# PRINT THE CONTENTS OF EACH SEARCH!
for voen in voens:
form_data['VOEN'] = voen
idData = [string for string in s.select("td", colspan_="4")]
print(idData)
答案 0 :(得分:1)
您需要在循环中使用每个更新的voen值进行POST,并提取ID并提出新请求。
import requests,re
from bs4 import BeautifulSoup as bs
data = {'VOEN': '', 'button': 'Search'}
voens = ['1303450301', '1700393071', '2002283071']
for voen in voens:
data['VOEN'] = voen
r = requests.post('https://e-mehkeme.gov.az/Public/Cases', data=data)
soup = bs(r.text, 'lxml')
ids = [i['value'] for i in soup.select('.casedetail')]
for i in ids:
r = requests.get(f'https://e-mehkeme.gov.az/Public/CaseDetail?caseId={i}')
soup = bs(r.content, 'lxml')
print([re.sub('\n|\r|\n','',i.text.strip()) for i in soup.select('[colspan="4"]')])