如何才能带回所有条目而不是仅带回第一个条目?

时间:2019-07-02 10:46:03

标签: python-3.x web-scraping

我对python还是很陌生,但仍在学习编程。

我正在寻找从此页面上收集标题和艺术家的网站:https://www.billboard.com/charts/country-airplay/1990-01-20

,然后将其排列成表格格式。

我已经可以使用bs4 / requests通过以下方式提取项目:

for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
    print(title.text)

for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
    print(artist.text)

但是当我尝试将对象设置为变量时,它只会带回第一项。

title1 = title.text
print(title1)

我该如何带回所有领带?

import requests
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')

from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')

for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
    print(title.text)

for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
    print(artist.text)

title1 = title.text
print(title1)

2 个答案:

答案 0 :(得分:1)

使用此类chart-list-item定义一个循环,然后在该循​​环中指定要捕获的字段。鉴于以下脚本应产生rankartistalbum名称。

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.find_all(class_="chart-list-item"):
    rank = item.find(class_="chart-list-item__rank").get_text(strip=True)
    artist = item.find(class_="chart-list-item__artist").get_text(strip=True)
    album = item.find(class_="chart-list-item__title-text").get_text(strip=True)
    print(rank,artist,album)

输出类似于:

1 Clint Black Nobody's Home
2 Tanya Tucker My Arms Stay Open All Night
3 Ricky Van Shelton Statue Of A Fool
4 Alabama Southern Star
5 Keith Whitley It Ain't Nothin'

答案 1 :(得分:0)

您可以使用zip函数来合并数据。

i.text.strip()取出尾随新行/n

import pandas as pd   
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')    

title = [i.text.strip() for i in  (soup.find_all('div', attrs={'class':'chart-list-item__title'}))]
artist =  [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__artist'}))]

print(list(zip(artist,title)))

输出

[('Clint Black', "Nobody's Home"), ('Tanya Tucker', 'My Arms Stay Open All Night'),........]

使用熊猫在Dataframe中保存数据时

import pandas as pd   
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')    

title = [i.text.strip() for i in  (soup.find_all('div', attrs={'class':'chart-list-item__title'}))]
artist =  [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__artist'}))]

data = list(zip(title, artist))
dt = pd.DataFrame(data, columns = ['', 'title', 'artist'])
print(dt)

输出

                                       Title                             Artist

0                                   Nobody's Home                        Clint Black
1                     My Arms Stay Open All Night                       Tanya Tucker
2                                Statue Of A Fool                  Ricky Van Shelton
3                                   Southern Star                            Alabama
4                                It Ain't Nothin'                      Keith Whitley
5                                  It's You Again                         Skip Ewing
6                   When I Could Come Home To You                      Steve Wariner
7                  Many A Long & Lonesome Highway                     Rodney Crowell
8                         That Just About Does It                        Vern Gosdin
9                            Start All Over Again               The Desert Rose Band
10                              Out Of Your Shoes                      Lorrie Morgan
11                              On Second Thought                      Eddie Rabbitt
12                                  One Man Woman                          The Judds
13                   Till I Can't Take It Anymore                    Billy Joe Royal
14                              Overnight Success                      George Strait
15                              Where've You Been                       Kathy Mattea