嗨,我正试图抓住广告牌前200首歌曲/艺术家。
我明白我需要使用
curl http://www.billboard.com/charts/billboard-200 | grep "<h1>" ??
这会抓取第一页的歌曲并输出它们,但看起来很难看。
答案 0 :(得分:0)
以下是一些让你走的路:
curl -silent http://www.billboard.com/charts/billboard-200 | awk -vRS="id=\"rank_" -F"\n" 'NR>1 {split($4,m," *</?h1>");melody=m[2];split($6,a,"\"");artist=a[4];print "("$1+0") "melody " - " artist}'
(1) 1989 - Taylor Swift
(2) x - Ed Sheeran
(3) The Pinkprint - Nicki Minaj
(4) In The Lonely Hour - Sam Smith
(5) SremmLife - Rae Sremmurd
(6) Hozier - Hozier
(7) 2014 Forest Hills Drive - J. Cole
(8) Guardians Of The Galaxy: Awesome Mix Vol. 1 - Soundtrack
(9) FOUR - One Direction
(10) My Everything - Ariana Grande
(11) Into The Woods - Soundtrack
(12) Montevallo - Sam Hunt
(13) Annie - Soundtrack
(14) Greatest Hits: Decade #1 - Carrie Underwood
(15) V - Maroon 5
(16) Frozen - Soundtrack
(17) Old Boots, New Dirt - Jason Aldean
(18) Reclassified - Iggy Azalea
(19) Native - OneRepublic
(20) 1000 Forms Of Fear - Sia
答案 1 :(得分:0)
$ for i in {0..9}; do
saxon-lint --html --xquery '
for $a in //article[@id]
let $chart := $a//span[@class="this-week"]/text()
let $artist := normalize-space($a//div[@class="row-title"]/h3/a/text())
let $song := normalize-space($a//h2/text())
let $link := string($a//div[@class="row-title"]/h3/a/@href)
return
concat(
"[", $chart, "] ", $song, " - ", $artist,
" : http://www.billboard.com", $link
)
' "http://www.billboard.com/charts/billboard-200?page=$i"
done
[1] 1989 - Taylor Swift : http://www.billboard.com/artist/371422/taylor-swift
[2] x - Ed Sheeran : http://www.billboard.com/artist/276089/ed-sheeran
[3] The Pinkprint - Nicki Minaj : http://www.billboard.com/artist/312259/nicki-minaj
...
[200] Take Me Home - One Direction : http://www.billboard.com/artist/314021/one-direction