如何在webscraping中将数据写入csv中的新列?

时间:2018-05-26 19:37:07

标签: python csv web-scraping beautifulsoup code-formatting

我正在抓取广告牌热门的r& b / hip hop图表,我能够获取所有数据但是当我开始将数据写入csv时,格式化是完全错误的。

上周数高峰位置图表周数的数据全部显示在我的csv的前3列下,而不是各个标题所在的列。

这是我目前的代码:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.billboard.com/charts/r-b-hip-hop-songs'

# Opens web connetion and grabs page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# HTML parsing
page_soup = soup(page_html, "html.parser")

# Grabs song title, artist and picture
mainContainer = page_soup.findAll("div", {"class":"chart-row__main- 
display"})

# CSV filename creation
filename = "Billboard_Hip_Hop_Charts.csv"
f = open(filename, "w")

# Creating Headers
headers = "Billboard Number, Artist Name, Song Title, Last Week Number, Peak 
Position, Weeks On Chart\n"
f.write(headers)

# Get Billboard Number, Artist Name and Song Title 
for container in mainContainer:
    # Gets billboard number
    billboard_number = container.div.span.text

    # Gets artist name
    artist_name_a_tag = container.findAll("", {"class":"chart-row__artist"})
    artist_name = artist_name_a_tag[0].text.strip()

    # Gets song title
    song_title = container.h2.text

    print("Billboard Number: " + billboard_number)
    print("Artist Name: " + artist_name)
    print("Song Title: " + song_title)

    f.write(billboard_number + "," + artist_name + "," + song_title + "\n")

# Grabs side container from main container
secondaryContainer = page_soup.findAll("div", {"class":"chart-row__secondary"})

# Get Last Week Number, Peak Position and Weeks On Chart
for container in secondaryContainer:
    # Gets last week number
    last_week_number_tag = container.findAll("", {"class":"chart-row__value"})
    last_week_number = last_week_number_tag[0].text

    # Gets peak position
    peak_position_tag = container.findAll("", {"class":"chart-row__value"})
    peak_position = peak_position_tag[1].text

    # Gets week on chart
    weeks_on_chart_tag = container.findAll("", {"class":"chart-row__value"})
    weeks_on_chart = weeks_on_chart_tag[2].text

    print("Last Week Number: " + last_week_number)
    print("Peak Position: " + peak_position)
    print("Weeks On Chart: " + weeks_on_chart)

    f.write(last_week_number + "," + peak_position + "," + weeks_on_chart + "\n")

f.close()

这就是我的csv与标题广告牌编号艺术家姓名歌曲标题上周编号峰值位置周图表

1  Drake                                          Nice For What               
2  Post Malone Featuring Ty Dolla $ign            Psycho                      
3  Drake                                          God's Plan                  
4  Post Malone                                    Better Now                  
5  Post Malone Featuring 21 Savage                Rockstar                    
6  BlocBoy JB Featuring Drake                     Look Alive                  
7  Post Malone                                    Paranoid                    
8  Lil Dicky Featuring Chris Brown                Freaky Friday               
9  Post Malone                                    Rich & Sad                  
10 Post Malone Featuring Swae Lee                 Spoil My Night              
11 Post Malone Featuring Nicki Minaj              Ball For Me                 
12 Migos Featuring Drake                          Walk It Talk It             
13 Post Malone Featuring G-Eazy & YG              Same Bitches                
14 Cardi B| Bad Bunny & J Balvin                  I Like It                   
15 Post Malone                                    Zack And Codeine            
16 Post Malone                                    Over Now                    
17 Cardi B                                        Be Careful                  
18 Post Malone                                    Takin' Shots                
19 The Weeknd & Kendrick Lamar                    Pray For Me                 
20 Rich The Kid                                   Plug Walk                   
21 The Weeknd                                     Call Out My Name            
22 Bruno Mars & Cardi B                           Finesse                     
23 Post Malone                                    Candy Paint                 
24 Ella Mai                                       Boo'd Up                    
25 Rae Sremmurd & Juicy J                         Powerglide                  
26 Post Malone                                    92 Explorer                 
27 J. Cole                                        ATM                         
28 J. Cole                                        KOD                         
29 Post Malone                                    Otherside                   
30 Post Malone                                    Blame It On Me              
31 J. Cole                                        Kevin's Heart               
32 Kendrick Lamar & SZA                           All The Stars               
33 Nicki Minaj                                    Chun-Li                     
34 Lil Pump                                       Esskeetit                   
35 Migos                                          Stir Fry                    
36 Famous Dex                                     Japan                       
37 Post Malone                                    Sugar Wraith                
38 Cardi B Featuring Migos                        Drip                        
39 XXXTENTACION                                   Sad!                        
40 Jay Rock| Kendrick Lamar| Future & James Blake King's Dead                 
41 Rich The Kid Featuring Kendrick Lamar          New Freezer                 
42 Logic & Marshmello                             Everyday                    
43 J. Cole                                        Motiv8                      
44 YoungBoy Never Broke Again                     Outside Today               
45 Post Malone                                    Jonestown (Interlude)       
46 Cardi B Featuring 21 Savage                    Bartier Cardi               
47 YoungBoy Never Broke Again                     Overdose                    
48 J. Cole                                        1985 (Intro To The Fall Off)
49 J. Cole                                        Photograph                  
50 Khalid| Ty Dolla $ign & 6LACK                  OTW
1  1                                              2
2  1                                              6
3  1                                              17
4  2                                              12
5  3                                              14
10 6                                              8
...

有关将数据放入正确列的任何帮助都有帮助!

1 个答案:

答案 0 :(得分:1)

您的代码不必要地杂乱且难以阅读。您根本不需要创建两个容器,因为一个容器足以获取所需的数据。尝试以下方式,找到相应填写数据的csv:

ref struct

输出如下:

import requests, csv
from bs4 import BeautifulSoup

url = 'https://www.billboard.com/charts/r-b-hip-hop-songs'

with open('Billboard_Hip_Hop_Charts.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Billboard Number','Artist Name','Song Title','Last Week Number','peak_position','weeks_on_chart'])

    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")

    for container in soup.find_all("article",class_="chart-row"):

        billboard_number = container.find(class_="chart-row__current-week").text

        artist_name_a_tag = container.find(class_="chart-row__artist").text.strip()

        song_title = container.find(class_="chart-row__song").text

        last_week_number_tag = container.find(class_="chart-row__value")
        last_week_number = last_week_number_tag.text

        peak_position_tag = last_week_number_tag.find_parent().find_next_sibling().find(class_="chart-row__value")
        peak_position = peak_position_tag.text

        weeks_on_chart_tag = peak_position_tag.find_parent().find_next_sibling().find(class_="chart-row__value").text

        print(billboard_number,artist_name_a_tag,song_title,last_week_number,peak_position,weeks_on_chart_tag)
        writer.writerow([billboard_number,artist_name_a_tag,song_title,last_week_number,peak_position,weeks_on_chart_tag])