R:在数据框中计算元组

时间:2018-06-25 11:51:52

标签: r dataframe

我希望标题很宽但足够清晰。任何提示,帮助非常感谢。

我确实在R中有一个Dataframe,由三列和大约70k行给出。我必须执行以下操作:

假设我将一行的前两列标识为元组(x,y),将列z标识为与元组(x,y)相关联的“关键字”,我想计算给定的元组(x,y)与z相关联-而(x,y)可能与一个以上的键z相关联。因此,尽管这是分析地图(x,y)-> z的第一步,但我认为拥有一个数据框来告诉我(x,y)与z关联的频率会很方便。

例如:

亚当,德国,会计

奥地利伯特,运营

德国伯特HR

亚当,德国,人力资源

我想看看:

HR-> Bert,德国;       德国亚当

操作->奥地利伯特

会计->亚当,德国

2 个答案:

答案 0 :(得分:0)

好吧。

require(data.table)

# your data
dt1 <- data.table(name = c("Adam", "Bert","Bert", "Adam"),
                 cntry = c("Germany",  "Austria","Germany","Germany"),
                 occ = c("Accounting","Operations","HR", "HR"))


# make table
dt2 <- dcast(dt1, cntry + name ~ occ,value.var = "occ", fun.aggregate = length)

# make grand total
dt2[ , Total := rowSums(.SD), .SDcols = colnames(dt2)[-(1:2)]  ]


# output
     cntry name Accounting HR Operations Total
1: Austria Bert          0  0          1     1
2: Germany Adam          1  1          0     2
3: Germany Bert          0  1          0     1

答案 1 :(得分:0)

dt <- data.frame(name = c("Adam", "Bert","Bert", "Adam"),   
+ cntry = c("Germany",  "Austria","Germany","Germany"),  
+ occ = c("Accounting","Operations","HR", "HR"))  
dt$tuple<-paste(dt$name,",",dt$cntry,sep="")  
dt$tuple1<-dt$tuple  
dt<-reshape(dt[,3:5],idvar="occ",timevar="tuple1",direction="wide")  
dt[is.na(dt)]<-""  
dt<-data.frame(occ=dt[,1],tuples=apply(dt[,-1],1,paste,collapse="|"))  
dt$tuples<-gsub("\\|+","|",dt$tuples)  
dt  

输出

  occ                    tuples  
1 Accounting             Adam,Germany|  
2 Operations            |Bert,Austria|  
3 HR               Adam,Germany|Bert,Germany