Question

我试图让每一个＆＃34; a＆＃34;在HTML页面中标记，我试图使用 soup.find_all

这是我的代码：

r.text -- the youtube home page in html
soup = BeautifulSoup(r.text, 'html.parser')
        for lnk in soup.find_all('a' , {'class' : 'ytd-thumbnail'}):
            print(lnk)
            link = lnk.get("href")
            writeFile("queue.txt" , "https://youtube.com" + link)
            removeQueue(url)

我试图得到这样的东西：

<a id="thumbnail" class="yt-simple-endpoint inline-block style-scope ytd-thumbnail" aria-hidden="true" tabindex="-1" href="youtubelink">

但它甚至没有进入for循环，我不知道为什么

Answer 1

在record_resolve或attrs方法中传递字典时使用find_all。

find

Answer 2

You can try to use a CSS selector。我觉得它们更干净，更健壮。在此处，select会创建所有a代码的列表，其中class属性包含子字符串ytd-thumbnail。作为旁注，我还建议使用lxml解析器来处理bs4。

soup = BeautifulSoup(r.text, 'lxml')
for lnk in soup.select('a[class*=ytd-thumbnail]'):
    link = lnk.get("href")
    writeFile("queue.txt" , "https://youtube.com" + link)
    removeQueue(url)

beautifulsoup4不仅仅过滤类

2 个答案: