无法使用BeautifulSoup

时间:2019-01-14 23:11:58

标签: python beautifulsoup

我正在尝试制作一个非常简单的脚本,该脚本将在SoundCloud上刮擦前50种声音,将它们添加到字典中,然后将它们保存到文件中。当我尝试查找所有项目时,我什么也没回来(就像我输入的调试消息所示)。我想知道我做错了什么,如果有人可以帮助我解决问题,谢谢!

from bs4 import BeautifulSoup as Bs
import requests

website = "https://soundcloud.com/charts/top?genre=rock&country=all-countries"
session = requests.session()


def get_songs():
    songs = {}
    response = session.get(website)
    soup = Bs(response.text, "html.parser")

print(soup.title.text)

containers = soup.find_all("li", {"class": "chartTracks__item"})

if len(containers) == 0:
    print("Could not find any containers")

for element in containers:
    chart_track_div = element.div("chartTrack")
    details_div = chart_track_div.div("chartTrack__details")
    artist = details_div.div("chartTrack__username").text
    song_name = details_div.div("chartTrack__title").text

    songs[song_name] = artist

return songs


def create_file(songs_dictionary):
# Just printing out key&value for now

    for key, value in songs_dictionary:
        print("Song: " + key)
        print("Artist: " + value)


toSave = get_songs()
create_file(toSave)

这是我运行它后得到的:http://prntscr.com/m78dfr

1 个答案:

答案 0 :(得分:0)

我需要改变的几件事。

首先,它是一个动态页面,因此,如果您想使用soup.find_all("li", {"class": "chartTracks__item"})将该信息捕获到容器中,则必须先使用Selenium或{{3 }},然后执行.find_all

但是,您要提取的数据是在html源代码中找到的,但是在不同的标签下,所以我继续获取了您要捕获的信息。

第二,我不知道这是否正是您的意图,但是您将艺术家作为用户名,将歌曲作为标题。不幸的是,这些歌曲中的每首歌曲都有由soundcloud列出的稍微不同的格式。如果要真正严格地获得歌手(即艺术家)的头衔,则需要进行一些过滤并重新处理字符串。但是我照原样保留了它,您可以从那里选择要做什么。

第三,您没有将任何参数传递给第一个函数:

def get_songs():
    songs = {}
    response = session.get(website)

由于它指向website,因此它不会做任何事情,但是它从未传入。所以我将其更改为:

def get_songs(website):
    songs = {}
    response = session.get(website)

第四,您无法使用for key, value in songs_dictionary:遍历字典。它要求2个值,但只能解压缩1个。要执行您尝试的操作,您有2个选择:

for key, value in songs_dictionary.items():
        print("Song: " + key)
        print("Artist: " + value)

for key in songs_dictionary:
        print("Song: " + key)
        print("Artist: " + songs_dictionary[key])

我想这就是我找到的全部内容,但是完整的代码在这里:

from bs4 import BeautifulSoup as Bs
import requests

website = "https://soundcloud.com/charts/top?genre=rock&country=all-countries"
session = requests.session()


def get_songs(website):
    songs = {}
    response = session.get(website)
    soup = Bs(response.text, "html.parser")

    print(soup.title.text)

    containers = soup.find_all("section", {"class": "sounds"})
    songs_ranks = containers[0].find_all('li')


    if len(songs_ranks) == 0:
        print("Could not find any containers")

    for element in songs_ranks:

        artist = element.find_all('a')[1].text
        song_name = element.find('a', {'itemprop':'url'}).text

        songs[song_name] = artist

    return songs


def create_file(songs_dictionary):
# Just printing out key&value for now

    for key, value in songs_dictionary.items():
        print("Song: " + key)
        print("Artist: " + value)


toSave = get_songs(website)
create_file(toSave)

输出:

Song: KING
Artist: XXXTENTACION
Song: Queen - Bohemian Rhapsody
Artist: rizky.rilos
Song: áá
©á·áá
¡á¯ (Brit Rock Remix For áá
¡áá
­áá
¢áá
®á¨áá
¦) - BTS
Artist: BTS
Song: XXXTENTACION - NUMB
Artist: conrad foxx
Song: In The End
Artist: LINKIN_PARK
Song: I Write Sins Not Tragedies
Artist: Panic! At The Disco
Song: Man Upon The Hill
Artist: Stars and Rabbit
Song: Nirvana - Smells like teen spirit
Artist: Rocio Araujo
Song: Nickelback - Rockstar
Artist: Roadrunner USA
Song: xxxtentacion - valentine
Artist: ó  
Song: Zombie
Artist: Bad Wolves
Song: Marília Mendonça â Amante Não Tem Lar
Artist: Sertanejo Repost
Song: sleep thru ur alarms
Artist: Lontalius
Song: Angel With A Shotgun
Artist: NightCore
Song: Nightcore - My Demons
Artist: NightCore
Song: Armada - Harusnya Aku
Artist: DJCantik.com
Song: Dont Stop Me Now - Queen
Artist: Zinay Hernandez
Song: Sing To Me feat. Karen O
Artist: waltermartinmusic
Song: Everytime
Artist: boy pablo
Song: Tongue Tied - Grouplove
Artist: Atlantic Records
Song: For Beginners
Artist: M. Ward
Song: This Is Gospel
Artist: Panic! At The Disco
Song: Skillet - Hero
Artist: Warner Music Nashville
Song: Wonderwall - Oasis
Artist: Florian.N.
Song: High Hopes - Panic! At the disco
Artist: IrisDH
Song: Another One Bites The Dust (Remastered 2011)
Artist: Queen
Song: Panic! At The Disco - Bohemian Rhapsody (from Suicide Squad: The Album) (Audio)
Artist: Panic! At The Disco
Song: Killer Queen (Remastered 2011)
Artist: Queen
Song: Blue Bird-Naruto Shippuden 3rd Opening Theme
Artist: flaviogomes23
Song: Virzha-tentang rindu mp3
Artist: Arjuna Bilal
Song: Tipe-X - Mawar Hitam
Artist: Tora Loaadiing
Song: Lolot - Galungan Lan Kuningan
Artist: I Made Suwita
Song: Red Hot Chili peppers - Californication
Artist: arthyum
Song: Nickelback - How You Remind Me
Artist: Roadrunner USA
Song: 2004 Green Day "Boulevard of broken dreams" Vinyl rip
Artist: Collin Codeïne
Song: Zé Neto E Cristiano -  Seu Polícia (DVD Zé Neto E Cristiano Ao Vivo Em São José Do Rio Preto)
Artist: Sertanejo universitario (2018)
Song: Pink Floyd - Wish You Were Here
Artist: Ulviyya Ali
Song: Apocalypse
Artist: Cigarettes After Sex
Song: Linkin Park - In The End
Artist: ALLMusic
Song: Come As You Are
Artist: Nirvana
Song: Avenged Sevenfold - Dear God
Artist: Malik Hamza Sajjad
Song: Kaleo - Way Down We Go
Artist: AminAshkan
Song: Ya Qurban, Khumariyaan, Coke Studio Season 11, Episode 7
Artist: CokeStudio
Song: IDOL (Korean classical music ver.)_2018MMA VER.
Artist: Atm Soo
Song: Gym Best Music For Workout vol 2
Artist: Gym Best MusicFor Workout
Song: Do I Wanna Know? - Arctic Monkeys
Artist: Teenage Kicks.
Song: Um44k - Nossa Música âªâ«
Artist: Portal do Rap
Song: Nanatsu No Taizai (The Seven Deadly Sins) Anime OST - Perfect Time (POWER SONG)
Artist: cobritsa
Song: Tipe X - Genit
Artist: Hilmie CintaSederhana
Song: Kodaline - All I want - Acoustic Performance
Artist: Andy Wells 1