如何在数据框中转换字符串

时间:2019-05-28 09:23:42

标签: r tidyverse

我有一个name,例如 Robin the Bruyne Loo的Victor 这些名称在我的会话中的dataframe中。我需要将这些名称更改为:

<< em>姓氏,名字中间名>,

因此被唤醒。但是我不知道该怎么做。

我知道我可以将separate()中的map()PURRtidyverse之类的东西一起使用。

数据:

  ~nr,            ~name,        ~prodno,
  2019001,       "Piet de Boer", "lux_zwez",
  2019002,       "Elly Hamstra",  "zuv_vla",
  2019003, "Sue Ellen Schilder",  "zuv_vla",
  2019004,      "Truus Janssen", "zuv_vmlk",
  2019005,  "Evelijne de Vries", "lux_zwez",
  2019006,     "Berend Boersma", "lux_gras",
  2019007,   "Marius van Asten",  "zuv_vla",
  2019008,     "Corneel Jansen", "lux_gras",
  2019009,     "Joke Timmerman",  "zuv_vla",
  2019010, "Jan Willem de Jong", "lux_zwez",
  2019011,   "Frederik Janssen", "zuv_vmlk",
  2019012,   "Antonia de Jongh", "zuv_vmlk",
  2019013,   "Lena van der Loo",  "zuv_qrk",
  2019014,   "Johanna Haanstra", "lux_gras"

2 个答案:

答案 0 :(得分:3)

我们可以在此处尝试使用sub

names <- c("Robin the Bruyne", "Victor from the Loo")
output <- sub("^(.*) ([A-Z][a-z]+)$", "\\2, \\1", names)
output

[1] "Bruyne, Robin the"    "Loo, Victor from the"

此方法使用以下模式:

^(.*)          capture everything from the start until the last space
([A-Z][a-z]+)$ capture the last name, which starts with a capital

然后,我们用姓氏和名字/中间名替换,用逗号分隔。

答案 1 :(得分:0)

如果我对您的理解正确,那应该可以。

dat = tibble::tribble(
  ~nr,            ~name,        ~prodno,
  2019001,       "Piet de Boer", "lux_zwez",
  2019002,       "Elly Hamstra",  "zuv_vla",
  2019003, "Sue Ellen Schilder",  "zuv_vla",
  2019004,      "Truus Janssen", "zuv_vmlk",
  2019005,  "Evelijne de Vries", "lux_zwez",
  2019006,     "Berend Boersma", "lux_gras",
  2019007,   "Marius van Asten",  "zuv_vla",
  2019008,     "Corneel Jansen", "lux_gras",
  2019009,     "Joke Timmerman",  "zuv_vla",
  2019010, "Jan Willem de Jong", "lux_zwez",
  2019011,   "Frederik Janssen", "zuv_vmlk",
  2019012,   "Antonia de Jongh", "zuv_vmlk",
  2019013,   "Lena van der Loo",  "zuv_qrk",
  2019014,   "Johanna Haanstra", "lux_gras"
)

library(magrittr)
dat %>% dplyr::mutate(
  lastname = stringr::str_extract(name,"(?<=[:blank:])[:alnum:]+$"),
  firstname = stringr::str_extract(name,".*(?=[:blank:])"),
  name = paste(lastname,firstname,sep = ", ")
) %>% dplyr::select(-firstname,-lastname)
#> # A tibble: 14 x 3
#>         nr name                prodno  
#>      <dbl> <chr>               <chr>   
#>  1 2019001 Boer, Piet de       lux_zwez
#>  2 2019002 Hamstra, Elly       zuv_vla 
#>  3 2019003 Schilder, Sue Ellen zuv_vla 
#>  4 2019004 Janssen, Truus      zuv_vmlk
#>  5 2019005 Vries, Evelijne de  lux_zwez
#>  6 2019006 Boersma, Berend     lux_gras
#>  7 2019007 Asten, Marius van   zuv_vla 
#>  8 2019008 Jansen, Corneel     lux_gras
#>  9 2019009 Timmerman, Joke     zuv_vla 
#> 10 2019010 Jong, Jan Willem de lux_zwez
#> 11 2019011 Janssen, Frederik   zuv_vmlk
#> 12 2019012 Jongh, Antonia de   zuv_vmlk
#> 13 2019013 Loo, Lena van der   zuv_qrk 
#> 14 2019014 Haanstra, Johanna   lux_gras

reprex package(v0.2.1)于2019-06-02创建