美丽的汤提取物信息

时间:2019-12-09 22:39:15

标签: python python-3.x beautifulsoup web-crawler

我正在尝试通过使用精美的汤料来提取化学物质的名称,其出现/用途和添加的日期。 这是清单中化学物质的一个例子 https://oehha.ca.gov/chemicals/abiraterone-acetate

有人可以帮我吗?非常感谢你!

我的愿望输出将是

ggplot(input, aes(x=chrom, y=cell_0, group=1)) +
  geom_point() +
  geom_line(color = "#00AFBB", size = 1) 

1 个答案:

答案 0 :(得分:0)

请注意,该网站受到incapsula防火墙的保护,以防止机器人和浏览器自动化。

使用Selenium,我们可以在下面实现您的目标:

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Firefox()
url = 'https://oehha.ca.gov/chemicals/abiraterone-acetate'
sada = browser.get(url)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')

title = soup.find('h1', {'class': 'title'})
print(title.text.strip())
details = soup.find(string='Occurence(s)/Use(s)').find_next('p').contents[0]
print(details)
date = soup.find('span', {'class': 'date-display-single'})
print(date.text)

browser.close()

输出:

Abiraterone acetate
A CYP17 inhibitor indicated in combination with prednisone for the treatment of patients with metastatic castration-resistant prostate cancer.
02/02/2016