我的数据集看起来像这样-
dataset = data.frame(Comments=c('Wow... Loved this place. 1','Crust is not good. 0','Not tasty and the texture was just nasty. 0'))
我正在尝试将数据集分为两列,以使第一列仅包含文本,第二列仅包含每个字符串末尾的数字。 / strong>
这是我的尝试
library(dplyr)
library(tidyr)
dataset = dataset %>%
separate(Comments, into = c("Comment", "Score"), sep = " (?=[^ ]+$)")
但是我没有得到完美的分离。我在网上查看了其他解决方案,但还没有走运。
对此将提供任何帮助。
答案 0 :(得分:1)
也许您可以使用substr
和gsub
dataset <- dataset %>%
mutate(Comments = as.character(Comments)) %>%
mutate(Score = substr(Comments, nchar(Comments), nchar(Comments))) %>%
mutate(Comment = gsub("\\s\\d", "", Comments))
答案 1 :(得分:0)
一种解决方案是利用stringr
函数:
dataset %>%
mutate(Score = str_extract_all(Comments, pattern = "[:digit:]"),
Comments = str_remove_all(Comments, pattern = "[:digit:]") %>% str_trim())
# Comments Score
#1 Wow... Loved this place. 1
#2 Crust is not good. 0
#3 Not tasty and the texture was just nasty. 0