我有全名数据,我使用strsplit()来获取名称的每个元素。
# Dataframe with a `names` column (complete names)
df <- data.frame(
names =
c("Adam, R, Goldberg, MALS, MBA",
"Adam, R, Goldberg, MEd",
"Adam, S, Metsch, MBA",
"Alan, Haas, MSW",
"Alexandra, Dumas, Rhodes, MA",
"Alexandra, Ruttenberg, PhD, MBA"),
stringsAsFactors=FALSE)
# Add a column with the split names (it is actually a list)
df$splitnames <- strsplit(df$names, ', ')
我还有一个学位列表
degrees<-c("EdS","DEd","MEd","JD","MS","MA","PhD","MSPH","MSW","MSSA","MBA",
"MALS","Esq","MSEd","MFA","MPA","EdM","BSEd")
我想得到每个名字和各自学位的交集。
我不确定如何拼合名单,以便我可以使用相交来比较两个向量。当我尝试unlist(df$splitname,recursive=F)
时,它分别返回了每个元素。任何帮助表示赞赏。
答案 0 :(得分:3)
尝试
df$intersect <- lapply(X=df$splitname, FUN=intersect, y=degrees)
这将为您提供df$splitname
(例如 intersect(df$splitname[[1]], degrees)
)中每个元素的交集列表。如果你想把它作为矢量:
sapply(X=df$intersect, FUN=paste, collapse=', ')
我假设您需要它作为向量,因为可能完整的名称来自一个(例如,来自数据帧),但是strsplit会输出一个列表。
这有用吗?如果没有,请尝试澄清您的意图。
祝你好运!答案 1 :(得分:0)
为了保持连续性,您可以使用unlist
:
hh <- unlist(df$splitname)
intersect(hh,degrees)
例如:
ll <- list(c("Adam" , "R" , "Goldberg" ,"MALS" , "MBA "),
c("Adam" , "R" , "Goldberg", "MEd" ))
intersect(hh,degrees)
[1] "MEd"
或等同于:
hh[hh %in% degrees]
[1] "MEd"
要获得差异,您可以使用
setdiff(hh,degrees)
[1] "Adam" "R" "Goldberg" "MALS" "MBA "
...