Question

<a href="link" target="_blank" class="xXx text-user topic-author" sl-processed="1">
    diamonds
</a>

I would like to extract the pseudo 'diamonds' which is in the 'a' tag with BeautifulSoup.

I tried a lot of things but it always return me 'None'.

For me the thing which was supposed to work was this

 txt = soup.find('a', {'class': 'xXx text-user topic-author'})
 print (txt)

Answer 1

看起来作者的css类在整个页面中都不一样，所以你需要做一些过滤。

作者元素有多个css类，但它们有一些相似之处。

以下代码将打印出作者。它首先抓住作者所在的元素。问题是这个css类（JvCare）用于很多事情。页面的元素计数返回98，但是只有25个作者名称，因此之后需要进行一些过滤。

import requests
from bs4 import BeautifulSoup

url = "http://www.jeuxvideo.com/forums/0-7059-0-1-0-1-0-another-war.htm"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
JvCs = soup.find_all('span', attrs={'class': 'JvCare'})
for j in JvCs:
    if 'topic-author' in j['class']:
        print(j.text.strip())

j [＆＃39; class＆＃39;]返回JvCs列表中98个元素中每个元素的不同css类的列表。作者姓名所在的位置有一个名为＆＃39; topic-author＆＃39;的css-class。

因此，我们只是检查主题作者是否＆＃39;在j [＆＃39; class＆＃39;]为98个元素中的每个元素返回的列表中。如果是这样 - 打印作者姓名。

希望这有助于您进一步发展。

修改对于涉及两个或更多css选择器的情况，似乎有更聪明的方法（在非常棒的docs中提到BeautifulSoup）。在这些情况下，文档建议使用.select-method。在你的情况下会是这样的：

author_list = soup.select('span.JvCare.topic-author') for author in author_list: print(author.text.strip())

Python & BeautifulSoup : How to extract a tags' value which is in many others tags?

1 个答案:

Python &amp; BeautifulSoup : How to extract a tags&#39; value which is in many others tags?

1 个答案:

Python & BeautifulSoup : How to extract a tags' value which is in many others tags?