我有以下页面-https://www.medline.com/sku/item/MDPMDS89740KDMB?skuIndex=S1&question=&flowType=browse&indexCount=1 我想提取表中给出的制造商名称。我写了下面的代码来获取它,但是路径似乎不正确。我不确定如何从表中仅获取特定的td值。
def name(url):
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
table= soup.find("table", {"class":"medSKUTableDetails"})
mnf= table.find('td')
if mnf:
print(mnf.string)
else:
print("issue")
答案 0 :(得分:3)
您可以使用文本<td>
搜索"Manufacturer"
标签,然后再查找下一个具有制造商名称的<td>
。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.medline.com/sku/item/MDPMDS89740KDMB?skuIndex=S1&question=&flowType=browse&indexCount=1'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
manufacturer = soup.select_one('td:contains("Manufacturer")')
if manufacturer:
manufacturer = manufacturer.find_next('td').text
else:
manufacturer = 'Not Found'
print(manufacturer)
打印:
Medline