作为一名初学者,我目前正在使用R' rvest'包。我的目标是获取来自www.musixmatch.com'的任何歌曲的歌词。这是我的尝试:
library(rvest)
url <- "https://www.musixmatch.com/lyrics/Red-Hot-Chili-Peppers/Can-t-Stop"
musixmatch <- read_html(url)
lyrics <- musixmatch%>%html_nodes(".mxm-lyrics__content")%>%html_text()
此代码会创建一个向量&#39;歌词&#39;有2行,包含歌词:
[1] "Can't stop addicted to the shindig\nChop top he says I'm gonna win big\nChoose not a life of imitation"
[2] "Distant cousin to the reservation\n\nDefunkt the pistol that you pay for\nThis punk the feeling that you stay for\nIn time I want to be your best friend\nEastside love is living on the Westend\n\nKnock out but boy you better come to\nDon't die you know the truth is some do\nGo write your message on the pavement\nBurn so bright I wonder what the wave meant\n\nWhite heat is screaming in the jungle\nComplete the motion if you stumble\nGo ask the dust for any answers\nCome back strong with 50 belly dancers\n\nThe world I love\nThe tears I drop\nTo be part of\nThe wave can't stop\nEver wonder if it's all for you\nThe world I love\nThe trains I hop\nTo be part of\nThe wave can't stop\n\nCome and tell me when it's time to\n\nSweetheart is bleeding in the snow cone\nSo smart she's leading me to ozone\nMusic the great communicator\nUse two sticks to make it in the nature\nI'll get you into penetration\nThe gender of a generation\nThe birth of every other nation\nWorth your weight the gold ... <truncated>
问题是第二行在某些时候被截断了。根据我对rvest的了解,没有参数来调整截断。另外,我在互联网上找不到关于这个问题的任何内容。有人知道如何调整/禁用此功能的截断吗?非常感谢提前!
致以最诚挚的问候,
扬
答案 0 :(得分:-1)
我认为最好将歌词复制并粘贴到记事本或写字板中。另存为.txt文件
然后使用readLines
函数,它打印我们的警告信息,但我能够在84x1 chacacter矢量中拥有整个歌词,你可以清理或做任何你喜欢的事情。
words <- readLines("redhot.txt")
> head(words)
[1] "Can't stop addicted to the shindig"
[2] "Chop top he says I'm gonna win big"
[3] "Choose not a life of imitation"
[4] "Distant cousin to the reservation"
[5] "Defunkt the pistol that you pay for"
[6] "This punk the feeling that you stay for"
此处没有截断问题。