单独"名称"进入" FirstName"和"姓氏"数据框的列

时间:2014-10-21 14:32:34

标签: r strsplit

我正在努力弄清楚如何在数据帧中将单个“Name”列拆分为同一数据帧中的另外两列FistName和LastName。挑战在于我的一些姓名有几个姓氏。本质上,我想取第一个单词(或字符串的元素)并将其放在FirstName列中,然后将所有后续文本(当然减去空格)放入LastName列。

这是我的DataFrame“tteam”

NAME <- c('John Doe','Peter Gynn','Jolie Hope-Douglas', 'Muhammad Arnab Halwai')
TITLE <- c("assistant", "manager", "assistant", "specialist")
tteam<- data.frame(NAME, TITLE)

我想要的输出是这样的:

FirstName <- c("John", "Peter", "Jolie", "Muhammad")
LastName <- c("Doe", "Gynn", "Hope-Douglas", "Arnab Halwai")
tteamdesire <- data.frame(FirstName, LastName, TITLE)

我尝试了以下代码来创建一个只有名称的新数据框,允许我从第一列中提取名字。但是,我无法将姓氏列入任何顺序。

names <- tteam$NAME ##  puts full names into names vector
namesdf <- data.frame(do.call('rbind', strsplit(as.character(names),' ',fixed=TRUE))) 
## splits out all names into a dataframe PROBLEM IS HERE!

4 个答案:

答案 0 :(得分:7)

您可以使用extract

中的tidyr
 library(tidyr)
 extract(tteam, NAME, c("FirstName", "LastName"), "([^ ]+) (.*)")
 #  FirstName     LastName      TITLE
 #1      John          Doe  assistant
 #2     Peter         Gynn    manager
 #3     Jolie Hope-Douglas  assistant
 #4  Muhammad Arnab Halwai specialist

答案 1 :(得分:4)

尝试:

> firstname = sapply(strsplit(NAME, ' '), function(x) x[1])
> firstname 
[1] "John"     "Peter"    "Jolie"    "Muhammad"

> lastname = sapply(strsplit(NAME, ' '), function(x) x[length(x)])
> lastname
[1] "Doe"          "Gynn"         "Hope-Douglas" "Halwai"      

或:

> ll = strsplit(NAME, ' ')
> 
> firstname = sapply(ll, function(x) x[1])
> lastname = sapply(ll, function(x) x[length(x)])
> 
> firstname
[1] "John"     "Peter"    "Jolie"    "Muhammad"
> lastname
[1] "Doe"          "Gynn"         "Hope-Douglas" "Halwai"      

答案 2 :(得分:3)

1)sub

data.frame(FirstName = sub(" .*", "", tteam$NAME), 
           LastName = sub("^\\S* ", "", tteam$NAME),
           tteam[-1])

2)gsubfn :: read.pattern NAME<-我们可以省略as.character,如果它已经是字符(而不是因素):

library(tteam)

cn <- c("FirstName", "LastName")
NAME <- as.character(tteam$NAME)

cbind( read.pattern(text = NAME, pattern = "^(\\S*) (.*)", col.names = cn), tteam[-1])

更新更新解决方案,使其符合tteam并添加第二个解决方案。

答案 3 :(得分:0)

您可以使用软件包 unglue

library(unglue)
unglue_unnest(tteam, NAME, "{FirstName} {LastName}")
#>        TITLE FirstName     LastName
#> 1  assistant      John          Doe
#> 2    manager     Peter         Gynn
#> 3  assistant     Jolie Hope-Douglas
#> 4 specialist  Muhammad Arnab Halwai