我对python还是很陌生,但仍在学习编程。
我正在寻找从此页面上收集标题和艺术家的网站:https://www.billboard.com/charts/country-airplay/1990-01-20
,然后将其排列成表格格式。
我已经可以使用bs4 / requests通过以下方式提取项目:
for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
print(title.text)
for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
print(artist.text)
但是当我尝试将对象设置为变量时,它只会带回第一项。
title1 = title.text
print(title1)
我该如何带回所有领带?
import requests
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
for title in soup.find_all('div', attrs={'class':'chart-list-item__title'}):
print(title.text)
for artist in soup.find_all('div', attrs={'class':'chart-list-item__artist'}):
print(artist.text)
title1 = title.text
print(title1)
答案 0 :(得分:1)
使用此类chart-list-item
定义一个循环,然后在该循环中指定要捕获的字段。鉴于以下脚本应产生rank
,artist
和album
名称。
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.find_all(class_="chart-list-item"):
rank = item.find(class_="chart-list-item__rank").get_text(strip=True)
artist = item.find(class_="chart-list-item__artist").get_text(strip=True)
album = item.find(class_="chart-list-item__title-text").get_text(strip=True)
print(rank,artist,album)
输出类似于:
1 Clint Black Nobody's Home
2 Tanya Tucker My Arms Stay Open All Night
3 Ricky Van Shelton Statue Of A Fool
4 Alabama Southern Star
5 Keith Whitley It Ain't Nothin'
答案 1 :(得分:0)
您可以使用zip
函数来合并数据。
i.text.strip()
取出尾随新行/n
。
import pandas as pd
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
title = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__title'}))]
artist = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__artist'}))]
print(list(zip(artist,title)))
[('Clint Black', "Nobody's Home"), ('Tanya Tucker', 'My Arms Stay Open All Night'),........]
import pandas as pd
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/country-airplay/1990-01-20')
soup = BeautifulSoup(r.text, 'html.parser')
title = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__title'}))]
artist = [i.text.strip() for i in (soup.find_all('div', attrs={'class':'chart-list-item__artist'}))]
data = list(zip(title, artist))
dt = pd.DataFrame(data, columns = ['', 'title', 'artist'])
print(dt)
Title Artist
0 Nobody's Home Clint Black
1 My Arms Stay Open All Night Tanya Tucker
2 Statue Of A Fool Ricky Van Shelton
3 Southern Star Alabama
4 It Ain't Nothin' Keith Whitley
5 It's You Again Skip Ewing
6 When I Could Come Home To You Steve Wariner
7 Many A Long & Lonesome Highway Rodney Crowell
8 That Just About Does It Vern Gosdin
9 Start All Over Again The Desert Rose Band
10 Out Of Your Shoes Lorrie Morgan
11 On Second Thought Eddie Rabbitt
12 One Man Woman The Judds
13 Till I Can't Take It Anymore Billy Joe Royal
14 Overnight Success George Strait
15 Where've You Been Kathy Mattea