在python3中,我想从site中提取信息并放入变量
例如,在我要存储的“ Dados do processo”块中:
"Indenização por Dano Moral"
"Direito de Imagem"
"Violeta Miera Arriba"
"R$ 38.160,00"
要隔离该块:
from bs4 import BeautifulSoup
import requests
link = 'https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=01001DTQA0000&processo.foro=1&uuidCaptcha=sajcaptcha_380320b510ee415ca0ca56cfac794999'
try:
res = requests.get(link, verify=False) # avoid SSLError
except (requests.exceptions.HTTPError, requests.exceptions.RequestException, requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
print(str(e))
except Exception as e:
print("Exceção")
soup = BeautifulSoup(res.text, "lxml")
janela1 = soup.find_all("table",{"class":"secaoFormBody"})[1]
dados_processo = janela1.find_all("tr",{"class":""})
例如,信息“Indenizaçãopor Dano Moral”位于dados_processo中
<tr class="">
<td id="" valign="" width="150">
<label class="labelClass" for="" style="text-align:right;font-weight:bold;;">Assunto:</label>
</td>
<td valign="">
<span class="" id="">Indenização por Dano Moral</span>
</td>
</tr>
请问,有人知道如何到达“ span class =”“ id =”“吗?我没有得到它,因为它以这种方式在块的多个点重复其自身,并以”“表示类和”“对于ID
我考虑过在“ label class =” labelClass“ for =”“”中寻找字符串“ Assunto:”,如果找到,它将采用“ span class =”“ id =”“中的字符串 这项检查很有用,因为某些相似的网站可能未包含所有商品
答案 0 :(得分:2)
您可以使用:contains
来定位“标题” ,然后使用adjacent sibling (+
) combinator来获取包含感兴趣值的td
。这是使用bs4 4.7.1
from bs4 import BeautifulSoup as bs
import requests
import urllib3; urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
r = requests.get('https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=01001DTQA0000&processo.foro=1&uuidCaptcha=sajcaptcha_380320b510ee415ca0ca56cfac794999', verify=False)
soup = bs(r.content, 'lxml')
print(soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Outros assuntos:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Juiz:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Valor da ação:")) + td').text.strip())
您可以使用if
来测试是否存在,以防万一:
soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td').text.strip() if soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td') is not None else 'N/A'