Question

我正在尝试将本网站上的文件归类为个人练习。当我运行以下代码时，我不知道为什么我没有在该网站上获得第一个文件URL。任何帮助表示赞赏。

import requests
from bs4 import BeautifulSoup
import regex

url = 'https://www.liberliber.it/online/autori/autori-p/niccolo-paganini/24-capricci-per-violino-solo-op-1/'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'lxml')

files = soup.find_all(href=regex.compile("\.mp3$"))

for h in files:
    a = h.findNext('a')
    #print(a.string)
    urls.append(a.attrs['href'])
    tags.append(a.string)

文件最终偏移一个mp3文件。为什么我没有得到第一个文件并在末尾添加另一个文件？

Answer 1

我不认为您想要findNext，因为您在a中拥有所有files标签。所以也许您只想要

for h in files:
    urls.append(h.attrs['href'])
    tags.append(h.string)

为什么BeautifulSoup4缺少第一个文件URL？

1 个答案: