需要将列表展平以在R中使用相交

时间:2013-02-20 04:17:09

标签: r list set-intersection

我有全名数据,我使用strsplit()来获取名称的每个元素。

# Dataframe with a `names` column (complete names)
df <- data.frame(
    names =
          c("Adam, R, Goldberg, MALS, MBA", 
          "Adam, R, Goldberg, MEd", 
          "Adam, S, Metsch, MBA", 
          "Alan, Haas, MSW", 
          "Alexandra, Dumas, Rhodes, MA", 
          "Alexandra, Ruttenberg, PhD, MBA"),
    stringsAsFactors=FALSE)

# Add a column with the split names (it is actually a list)
df$splitnames <- strsplit(df$names, ', ')

我还有一个学位列表

degrees<-c("EdS","DEd","MEd","JD","MS","MA","PhD","MSPH","MSW","MSSA","MBA",
           "MALS","Esq","MSEd","MFA","MPA","EdM","BSEd")

我想得到每个名字和各自学位的交集。

我不确定如何拼合名单,以便我可以使用相交来比较两个向量。当我尝试unlist(df$splitname,recursive=F)时,它分别返回了每个元素。任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:3)

尝试

df$intersect <- lapply(X=df$splitname, FUN=intersect, y=degrees)

这将为您提供df$splitname例如 intersect(df$splitname[[1]], degrees))中每个元素的交集列表。如果你想把它作为矢量:

sapply(X=df$intersect, FUN=paste, collapse=', ')

我假设您需要它作为向量,因为可能完整的名称来自一个(例如,来自数据帧),但是strsplit会输出一个列表。

这有用吗?如果没有,请尝试澄清您的意图。

祝你好运!

答案 1 :(得分:0)

为了保持连续性,您可以使用unlist

hh <- unlist(df$splitname)
intersect(hh,degrees)

例如:

ll <- list(c("Adam" ,    "R"    ,    "Goldberg" ,"MALS"  , "MBA "),
           c("Adam" ,    "R"    ,    "Goldberg", "MEd" ))

 intersect(hh,degrees)
[1] "MEd"

或等同于:

hh[hh %in% degrees]
[1] "MEd"

要获得差异,您可以使用

setdiff(hh,degrees)
[1] "Adam"     "R"        "Goldberg" "MALS"     "MBA "    

...