Question

我正在使用CSS选择器使用beautifulsoup 4模块从Web上抓取数据。

参见示例代码：

# pull website
res = requests.get('https://dailystoic.com/epictetus/')

#parse file
soup = bs4.BeautifulSoup(res.text, 'html.parser')

# CSS selector
elems = soup.select('body > div.wrap.container > div > main > article > div.entry-content > p:nth-child(1) > em > a:nth-child(3)')

# take content and store in variable
content = elems[0].text.strip()

# print content
print(content)

我想要超链接中的html文本。我不想要URL，但超链接说的是什么。

Answer 1

使用:nth-of-type()代替nth-child()。

import bs4, requests
res = requests.get('https://dailystoic.com/epictetus/')
soup = bs4.BeautifulSoup(res.text, 'html.parser')
elems = soup.select('body > div.wrap.container > div > main > article > div.entry-content > p:nth-of-type(1) > em > a:nth-of-type(3)')
print(elems[0].text)

.text获取超链接所说的内容 - link text。如果您需要该网址，请执行以下操作：elems[0].attrs['href']

输出：

Epictetus

用beautifulsoup解析css选择器

1 个答案: