我正在使用R进行数据操作。我有一个很长的名单,如下所示:
"names"
[1] ""
[2] "Victoria Marie"
[3] "Ori Mann"
[4] "Lina Pearl Right"
[5] "David Berg"
[6] "Anthony Lee"
[7] "Brian Michael Ingraham"
[8] "Jay Ling"
我想只将整个列表的名字和姓氏提取到新列中,并丢弃任何中间名称。我该怎么做呢? 我使用了以下代码:
mat = matrix(unlist(names), ncol=2, byrow=TRUE)
这只是遍历每个条目中的所有名称,并按顺序将它们全部抛出到列中。
非常感谢任何帮助。
答案 0 :(得分:1)
这是一种在基础R中执行此操作的方法,它还可以处理后缀的可能性。如果您发现其他后缀(例如,'II'),则可以将它们添加到%in%
后面的向量中。
# some representative data
names <- list("", "Ed Smith", "Jennifer Jason Leigh", "Ed Begley, Jr.")
# use strsplit to get a list of vectors of each name broken into its parts,
# keying off the space between names
names.split <- strsplit(unlist(names), " ")
# make new vectors with the first and last names, based on their position in
# those vectors. for last names, make the result conditional on whether or
# not a recognized suffix is in the last spot, and get rid of any
# punctuation attached to the last name if there was a suffix.
name.first <- sapply(names.split, function(x) x[1])
name.last <- sapply(names.split, function(x)
# this deals with empty name slots in your original list, returning NA
if(length(x) == 0) {
NA
# now check for a suffix; if one is there, use the penultimate item
# after stripping it of any punctuation
} else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr")) {
gsub("[[:punct:]]", "", x[length(x) - 1])
} else {
x[length(x)]
})
结果:
> name.first
[1] NA "Ed" "Jennifer" "Ed"
> name.last
[1] NA "Smith" "Leigh" "Begley"