用beautifulsoup解析css选择器

时间:2018-03-11 01:36:03

标签: python html css beautifulsoup

我正在使用CSS选择器使用beautifulsoup 4模块从Web上抓取数据。

参见示例代码:

# pull website
res = requests.get('https://dailystoic.com/epictetus/')

#parse file
soup = bs4.BeautifulSoup(res.text, 'html.parser')

# CSS selector
elems = soup.select('body > div.wrap.container > div > main > article > div.entry-content > p:nth-child(1) > em > a:nth-child(3)')

# take content and store in variable
content = elems[0].text.strip()

# print content
print(content)

我想要超链接中的html文本。我不想要URL,但超链接说的是什么。

1 个答案:

答案 0 :(得分:0)

使用:nth-of-type()代替nth-child()

import bs4, requests
res = requests.get('https://dailystoic.com/epictetus/')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('body > div.wrap.container > div > main > article > div.entry-content > p:nth-of-type(1) > em > a:nth-of-type(3)')
print(elems[0].text)

.text获取超链接所说的内容 - link text。如果您需要该网址,请执行以下操作:elems[0].attrs['href']

输出:

Epictetus