从html代码获取具体信息

时间:2018-01-06 00:27:18

标签: html python-3.x web-scraping beautifulsoup soundcloud

这个想法是收集所有发布第一个字母的轨道的声音用户ID(不是名字)。在我们的“过去一年”的情况下,“f”。

我在soundcloud上使用了过滤器,并在下一个网址中得到了结果:https://soundcloud.com/search/sounds?q=f&filter.created_at=last_year&filter.genre_or_tag=hip-hop%20%26%20rap

我在html代码的后续行中找到了第一个用户的id(“wavey-hefner”): def constructTree(root: String, parentToChild: Map[String, String]): Node = ??? case class Node(name: String, children: Seq[Node])

我希望从整个html中获取每个用户的id。

我的代码是:

<a class="sound__coverArt" href="/wavey-hefner/foreign" draggable="true">

它什么都不返回:(

1 个答案:

答案 0 :(得分:3)

页面以JavaScript呈现。你可以使用Selenium来渲染它,首先安装Selenium:

pip3 install selenium

然后获得一个驱动程序,例如https://sites.google.com/a/chromium.org/chromedriver/downloads(如果你在Windows或Mac上,你可以获得Chrome的无头版本 - Canary,如果你愿意的话)把驱动程序放在你的路上。

from bs4 import BeautifulSoup
from selenium import webdriver
import time

browser = webdriver.Chrome()
url = ('https://soundcloud.com/search/sounds?q=f& filter.created_at=last_year&filter.genre_or_tag=hip-hop%20%26%20rap')
browser.get(url)
time.sleep(5)
# To make it load more scroll to the bottom of the page (repeat if you want to)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
html_source = browser.page_source
browser.quit()

soup =   BeautifulSoup(html_source, 'html.parser')
for id in soup.findAll("a", {"class" : "sound__coverArt"}):
    print (id.get('href'))

输出:

/tee-grizzley/from-the-d-to-the-a-feat-lil-yachty
/empire/fat-joe-remy-ma-all-the-way-up-ft-french-montana
/tee-grizzley/first-day-out
/21savage/feel-it
/pluggedsoundz/famous-dex-geek-1
/rodshootinbirds/fairytale-x-rod-da-god
/chancetherapper/finish-line-drown-feat-t-pain-kirk-franklin-eryn-allen-kane-noname
/alkermith/future-low-life-ft-the-weeknd-evol
/javon-woodbridge/fabolous-slim-thick
/hamburgerhelper/feed-the-streets-prod-dequexatron-1000
/rob-neal-139819089/french-montana-lockjaw-remix-ft-gucci-mane-kodak-black
/pluggedsoundz/famous-dex-energy
/ovosoundradiohits/future-ft-drake-used-to-this
/pluggedsoundz/famous
/a-boogie-wit-da-hoodie/fucking-kissing-feat-chris-brown
/wavey-hefner/foreign
/jalensantoy/foreplay
/yvng_swag/fall-in-luv
/rich-the-kid/intro-prod-by-lab-cook
/empire/fat-joe-remy-ma-money-showers-feat-ty-dolla-ign