在R中使用str_split()之后获取第二项

时间:2019-02-27 02:15:36

标签: r dplyr stringr

我有一个包含一些问题的数据框。我想从问题中删除开头的数字和句号,但保留所有其他内容。我不太了解regex,但这似乎是str_split()的完美用法,尤其是在dplyr管道中。但是,在分割字符串之后,我不确定如何获取第二个项目。我尝试按位置访问它,但是没有用。

x <- structure(list(question = c("01. I like my job.", 
                                 "02. I like my house.", 
                                 "03. I like my car.")), class = "data.frame", row.names = c(NA, -3L))

x %>% 
  mutate(words = str_split(question, "."))

返回此:

question                        words
01. I like my job.         <chr [19]>           
02. I like my house.       <chr [21]>           
03. I like my car.         <chr [19]>   

我希望它看起来像这样:

question                             words
01. I like my job.         I like my job.           
02. I like my house.       I like my house.     
03. I like my car.         I like my car.

我也尝试使用separate()strsplit(),但是我也无法使用其中任何一个。

2 个答案:

答案 0 :(得分:4)

我认为您正在寻找str_replace(或基数R中的sub

x %>% mutate(words = str_replace(question, "^\\d+\\.", ""))
#              question             words
#1   01. I like my job.    I like my job.
#2 02. I like my house.  I like my house.
#3   03. I like my car.    I like my car.

说明:

  1. ^是左字符串锚点
  2. \\d+\\.匹配一个或多个数字,后跟句号

可以通过以下方式使用str_split

x %>% mutate(words = paste0(map_chr(str_split(question, "\\."), 2), "."))

给出相同的结果。

答案 1 :(得分:2)

您可以将模式更改为\\.,然后获取word列的第二个元素。

library(tidyverse)

x %>% 
  mutate(words = str_split(question, "\\. ")[[1]][[2]]) 
#               question          words
# 1   01. I like my job. I like my job.
# 2 02. I like my house. I like my job.
# 3   03. I like my car. I like my job.