使用curl | grep得到广告牌前200名

时间:2015-01-20 15:06:48

标签: curl grep

嗨,我正试图抓住广告牌前200首歌曲/艺术​​家。

我明白我需要使用     curl http://www.billboard.com/charts/billboard-200 | grep "<h1>" ?? 这会抓取第一页的歌曲并输出它们,但看起来很难看。

2 个答案:

答案 0 :(得分:0)

以下是一些让你走的路:

curl -silent http://www.billboard.com/charts/billboard-200 | awk -vRS="id=\"rank_" -F"\n" 'NR>1 {split($4,m," *</?h1>");melody=m[2];split($6,a,"\"");artist=a[4];print "("$1+0") "melody " - " artist}'
(1) 1989 - Taylor Swift
(2) x - Ed Sheeran
(3) The Pinkprint - Nicki Minaj
(4) In The Lonely Hour - Sam Smith
(5) SremmLife - Rae Sremmurd
(6) Hozier - Hozier
(7) 2014 Forest Hills Drive - J. Cole
(8) Guardians Of The Galaxy: Awesome Mix Vol. 1 - Soundtrack
(9) FOUR - One Direction
(10) My Everything - Ariana Grande
(11) Into The Woods - Soundtrack
(12) Montevallo - Sam Hunt
(13) Annie - Soundtrack
(14) Greatest Hits: Decade #1 - Carrie Underwood
(15) V - Maroon 5
(16) Frozen - Soundtrack
(17) Old Boots, New Dirt - Jason Aldean
(18) Reclassified - Iggy Azalea
(19) Native - OneRepublic
(20) 1000 Forms Of Fear - Sia

答案 1 :(得分:0)

$ for i in {0..9}; do
   saxon-lint --html --xquery '
     for $a in //article[@id]
        let $chart    := $a//span[@class="this-week"]/text()
        let $artist   := normalize-space($a//div[@class="row-title"]/h3/a/text())
        let $song     := normalize-space($a//h2/text())
        let $link     := string($a//div[@class="row-title"]/h3/a/@href)
     return
        concat(
          "[", $chart, "] ", $song, " - ", $artist,
             " : http://www.billboard.com", $link
         )
   ' "http://www.billboard.com/charts/billboard-200?page=$i"
done

输出:

[1] 1989 - Taylor Swift : http://www.billboard.com/artist/371422/taylor-swift
[2] x - Ed Sheeran : http://www.billboard.com/artist/276089/ed-sheeran
[3] The Pinkprint - Nicki Minaj : http://www.billboard.com/artist/312259/nicki-minaj
...
[200] Take Me Home - One Direction : http://www.billboard.com/artist/314021/one-direction

注意