Question

我正在编写一个基于python的应用程序，该应用程序遍历所有音乐文件并下载各自的专辑艺术。我使用beautifulsoup4来废弃来自last.fm网站的专辑封面。

有更好的方法吗？因为有时我得到了对网站的过多请求的例外。就像last.fm api一样，它给了我我需要的东西，而不是报废。

我发现了这个，但我不认为它可以像我需要的那样使用 https://github.com/pylast/pylast

我没有尝试使用它，但这就是我现在这样做的方式：

def getAlbumArt(songDet):
   #create a search url according the name of the given song's album
   try:
       # deletes unnecessary words from the name
        if songDet.albumName.rfind("(") != -1:
            r = requests.get("http://www.last.fm/search?q="+songDet.albumName[:songDet.albumName.rfind("(")])
        else:
            r = requests.get("http://www.last.fm/search?q=" + songDet.albumName)
        html = bs4.BeautifulSoup(r.content, "html.parser")
        imagesLinks = html.find_all("ol")
        r.close()
        for imageLink in imagesLinks:
            for image in imageLink.contents:
                if type(image) is bs4.Tag:
                    if image.text.__contains__(songDet.artist):
                        # todo add path to save pic to the mp3 path
                        urllib.urlretrieve(image.find("img").attrs['src'], image.find("img").attrs[u'alt'] + ".jpg")
                        print "got a picture"
                        return True
                        #break
            #break
   except requests.RequestException as e:
       print e

Answer 1

首先，您一定要开始使用last.fm API。

并且，为了避免＆＃34;太多的请求＆＃34;问题是，如果您没有使用API，be a good web-scraping citizen并在请求之间引入延迟 - 您可以从time.sleep(delay_in_seconds)这样简单的事情开始（当然，您需要import time）。< / p>

专辑艺术抓斗python

1 个答案: