我正在尝试通过使用精美的汤料来提取化学物质的名称,其出现/用途和添加的日期。 这是清单中化学物质的一个例子 https://oehha.ca.gov/chemicals/abiraterone-acetate
有人可以帮我吗?非常感谢你!
我的愿望输出将是
ggplot(input, aes(x=chrom, y=cell_0, group=1)) +
geom_point() +
geom_line(color = "#00AFBB", size = 1)
答案 0 :(得分:0)
请注意,该网站受到incapsula
防火墙的保护,以防止机器人和浏览器自动化。
使用Selenium
,我们可以在下面实现您的目标:
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
url = 'https://oehha.ca.gov/chemicals/abiraterone-acetate'
sada = browser.get(url)
source = browser.page_source
soup = BeautifulSoup(source, 'html.parser')
title = soup.find('h1', {'class': 'title'})
print(title.text.strip())
details = soup.find(string='Occurence(s)/Use(s)').find_next('p').contents[0]
print(details)
date = soup.find('span', {'class': 'date-display-single'})
print(date.text)
browser.close()
输出:
Abiraterone acetate
A CYP17 inhibitor indicated in combination with prednisone for the treatment of patients with metastatic castration-resistant prostate cancer.
02/02/2016