通过使用正则表达式来匹配标题。编写R片段,以创建一个名为“ Female”的新列,并根据“ Name”列中提供的文本为其填充TRUE / FALSE值。就像如果“ Miss”为TRUE,如果没有称呼为“ NA”一样
这是数据框
df <- data.frame(PersonID=1:8, Name=c("Mr. Bob", "Ms. Blank", "Roger, Mr.", "MR Mark Simpson", "Miss Lisa", "Mrs. joshep", "Rakesh Kumar", "Kumar Gums Murphy"))
grepl("Miss", df, perl=TRUE)
输出:
FALSE,FALSE,FALSE
预期输出:
FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,NA,NA
有人可以帮我吗?
答案 0 :(得分:1)
如果您想将NA
设置为未指定,则必须首先排除不存在其他指定的情况。也就是说,仅仅因为"Miss"
不存在并不意味着"Mr"
或"MISS"
不存在。
在您的示例中,以下内容将分配"M"
,"F"
或NA
。请根据需要添加名称。
Titles <- c("Miss", "Ms","Mr","Mrs","MR","MS","MRS","MISS") # vector of possible titles
f.Titles <- c("Miss", "Ms","Mrs","MS","MRS","MISS") # vector of female specific titles
check <- NULL
for(i in 1:length(Titles)){
check <- cbind(check,grepl(Titles[i], df$Name, perl=TRUE))
}
colnames(check) <- Titles
apply(check,1,function(x)ifelse(!any(x),NA,
ifelse(any(names(which(x)) %in% f.Titles),"F","M")))
输出:
[1] "M" "F" "M" "M" "F" "F" NA NA
从那里开始
G <- apply(check,1,function(x)ifelse(!any(x),NA,
ifelse(any(names(which(x)) %in% f.Titles),"F","M")))
df$Female <- ifelse(G=="F",TRUE,ifelse(is.na(G),NA,FALSE))
df
PersonID Name Female
1 1 Mr. Bob FALSE
2 2 Ms. Blank TRUE
3 3 Roger, Mr. FALSE
4 4 MR Mark Simpson FALSE
5 5 Miss Lisa TRUE
6 6 Mrs. joshep TRUE
7 7 Rakesh Kumar NA
8 8 Kumar Gums Murphy NA
这是一个更高效的版本,完全可以满足您的要求。仍然需要指定所有可能的Titles
和女性标题(f.Titles
)
check <- apply(as.matrix(Titles), 1, function(x) grepl(x, df$Name, perl=TRUE))
colnames(check) <- Titles
df$Female <- apply(check,1,function(x)ifelse(!any(x),NA,ifelse(any(names(which(x)) %in% f.Titles),TRUE,FALSE)))