更正语法以从R的数据框中的列表列中删除连接字符

时间:2018-07-22 16:46:59

标签: r regex string

我知道有很多关于正则表达式和gsub解决方案的帖子和参考,但是我没有做任何工作,因此,如果这是重复的,我很抱歉,但是我已经呆了好几天。

我在数据框中有一个看起来像这样的文本列表:

c("pop", "rap", "trap music")

虽然我希望它看起来像这样...删除c,引号和括号。

pop, rap, trap music

我尝试了str_replace和gsub的许多组合。我也尝试过使用tidyr将列表分成不同的列,但是会将诸如“陷阱音乐”之类的变量分成不同的列。感谢您的帮助。

编辑:这是我需要帮助的列的str

> str(Artist_Genre_final$artist_genres)   
List of 100    
 $ : chr [1:5] "canadian hip hop" "canadian pop" "hip hop" "pop rap" ...  
 $ : chr [1:4] "hip hop" "pop rap" "rap" "west coast rap"  
 $ : chr [1:3] "pop" "rap" "trap music"  
 $ : chr [1:2] "pop" "rap"  
 $ : chr [1:4] "edm" "electropop" "pop" "tropical house"  

这是整个数据帧的str。

> str(Artist_Genre_final)
'data.frame':   100 obs. of  3 variables:
 $ Artist       : chr  "Drake" "Kendrick Lamar" "Lil Uzi Vert" "Post Malone" ...  
 $ Track        : chr  "One Dance" "HUMBLE." "XO TOUR Llif3" "rockstar" ...  
 $ artist_genres:List of 100  
  ..$ : chr  "canadian hip hop" "canadian pop" "hip hop" "pop rap" ...  
  ..$ : chr  "hip hop" "pop rap" "rap" "west coast rap"  
  ..$ : chr  "pop" "rap" "trap music"  

1 个答案:

答案 0 :(得分:0)

我复制了数据框的三行:

Artist <- c("Drake", "Kendrick Lamar", "Lil Uzi Vert")
Track <- c("One Dance", "HUMBLE.", "XO TOUR Llif3")
artist_genres <- list(c("canadian hip hop", "canadian pop", "hip hop", "pop rap"), 
                      c("hip hop", "pop rap", "rap", "west coast rap"),
                      c("pop", "rap", "trap music"))

Artist_Genre_final <- data.frame(Artist, Track, artist_genres=as.matrix(artist_genres), stringsAsFactors=FALSE)

然后进行测试,看是否与您的str()输出相同:

str(Artist_Genre_final)

# 'data.frame': 3 obs. of  3 variables:
# $ Artist       : chr  "Drake" "Kendrick Lamar" "Lil Uzi Vert"
# $ Track        : chr  "One Dance" "HUMBLE." "XO TOUR Llif3"
# $ artist_genres:List of 3
#  ..$ : chr  "canadian hip hop" "canadian pop" "hip hop" "pop rap"
#  ..$ : chr  "hip hop" "pop rap" "rap" "west coast rap"
#  ..$ : chr  "pop" "rap" "trap music"

看起来不错,所以上面印着cat(paste())

cat(paste(Artist_Genre_final$artist_genres[[3]], collapse=", "))

# pop, rap, trap music

您需要访问原子矢量,因此需要访问[[3]],否则将得到c(“ pop”,“ rap”,“ trap music”),因为您正在打印长度为1的列表,而不是字符向量本身。

编辑:

这是将其应用于整个列表的简单功能。当然,可能会有更聪明的方法来执行此操作,但这至少可以帮助您入门。

paste_genres <- function(x) {
    result <- character()
    for (i in 1:length(x)) result <- append(result, paste(x[[i]], collapse = ", "))
    return(result)
    }

temp <- paste_genres(Artist_Genre_final$artist_genres)
cat(temp, sep = "\n")

# canadian hip hop, canadian pop, hip hop, pop rap
# hip hop, pop rap, rap, west coast rap
# pop, rap, trap music