R基于模式的单独列

时间:2019-06-19 15:59:25

标签: r tidyr

我的数据集看起来像这样-

dataset = data.frame(Comments=c('Wow... Loved this place.   1','Crust is not good.  0','Not tasty and the texture was just nasty.   0'))

我正在尝试将数据集分为两列,以使第一列仅包含文本,第二列仅包含每个字符串末尾的数字。 / strong>

这是我的尝试

library(dplyr)
library(tidyr)

dataset = dataset %>%
  separate(Comments, into = c("Comment", "Score"), sep = " (?=[^ ]+$)")

但是我没有得到完美的分离。我在网上查看了其他解决方案,但还没有走运。

对此将提供任何帮助。

2 个答案:

答案 0 :(得分:1)

也许您可以使用substrgsub

dataset <- dataset %>%
  mutate(Comments = as.character(Comments)) %>%
  mutate(Score = substr(Comments, nchar(Comments), nchar(Comments))) %>%
  mutate(Comment = gsub("\\s\\d", "", Comments))

答案 1 :(得分:0)

一种解决方案是利用stringr函数:

dataset %>% 
  mutate(Score = str_extract_all(Comments, pattern = "[:digit:]"), 
         Comments = str_remove_all(Comments, pattern = "[:digit:]") %>% str_trim())

#                                   Comments Score
#1                  Wow... Loved this place.     1
#2                        Crust is not good.     0
#3 Not tasty and the texture was just nasty.     0