从全名中分离出姓氏

时间:2018-12-23 11:00:57

标签: r string

我用过这个:

String <- unlist(str_split(Invname,"[ ]",n=2))

将我的名字分为姓和名,因为姓是第一位。但是我无法弄清楚如何将拆分后的Invname重新分配到两个列表中,这样我在项目的其余部分只能使用姓氏。现在我有这个:

" [471] "KRUEGER"                                 "MARCUS"         "

我想只将左侧分配给一个新变量,以便我可以进一步挖掘姓氏以获取信息。

6 个答案:

答案 0 :(得分:2)

使用nate.edwinton's answer中的数据,无需unlist

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")

String <- stringr::str_split(Invnames, "[ ]", n = 2)
Surnames <- sapply(String, '[', 1)
Firstnames <- sapply(String, '[', 2)
data.frame(Surnames, Firstnames)
#  Surnames Firstnames
#1  Krueger     Markus
#2      Doe       John
#3    Tatum     Jayson

答案 1 :(得分:1)

如评论中所述,如果您提供一些数据,将更容易获得帮助。无论如何,这可能是一个解决方案:

假设Invnames是每个名字(确切地)有一个姓氏的向量,则可以执行以下操作

# data
Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")
# extraction
String <- unlist(stringr::str_split(Invnames,"[ ]",n=2))
# saving first and last names
lastNames <- String[seq(1,length(String),2)]
firstNames <- String[seq(2,length(String),2)]
# yields
> cbind(lastNames,firstNames)
     lastNames firstNames
[1,] "Krueger" "Markus"  
[2,] "Doe"     "John"    
[3,] "Tatum"   "Jayson"  

答案 2 :(得分:1)

这是一些示例数据和建议的解决方案。根据@Rui Barradas的答案修改的数据:

Invnames <- c("Krueger.$Markus","Doe.John","Tatum.Jayson")
sapply(strsplit(Invnames,"\\W"),"[")

答案 3 :(得分:1)

这次再次使用来自dplyr的更早答案的数据

library(tidyverse)

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")
Invnames <- data.frame(Invnames)

Invnames %>%
  separate(Invnames, c('Surname', 'FirstName'), sep=" ")

 Surname FirstName
1 Krueger    Markus
2     Doe      John
3   Tatum    Jayson

答案 4 :(得分:1)

使用base R,我们可以使用read.table/read.csv将字符串分成几列

read.table(text = Invnames, header = FALSE, col.names = c("Surnames", "Firstnames"))
#  Surnames Firstnames
#1  Krueger     Markus
#2      Doe       John
#3    Tatum     Jayson

数据

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson")

答案 5 :(得分:0)

如果只有名字这么简单!如果字符串之间的并发症很少,则可以使用以下答案。以我的名字列表经验,我们得到连字符的名字(包括“名字”和“姓氏”),“中间”名字,标题和缩写名版本(Dr.,Mr,Md)以及许多其他变体。我首先尝试在拆分之前清理字符串。

这只是使用dplyr的一个主意(为清楚起见提供了明确的代码)

Invnames <- c("Krueger Markus","Doe John","Tatum Jayson", "Taylor - Cline Jeff", "Davis - Freud Melvin- John")

df <- as.data.frame(Invnames, Invnames = Invnames) %>% 
mutate(Invnames2 = gsub("- ","-",Invnames)) %>% 
mutate(Invnames2 = gsub(" -","-",Invnames2)) %>% 
mutate(surname = gsub(" .*", "", Invnames2))