我希望标题很宽但足够清晰。任何提示,帮助非常感谢。
我确实在R中有一个Dataframe,由三列和大约70k行给出。我必须执行以下操作:
假设我将一行的前两列标识为元组(x,y),将列z标识为与元组(x,y)相关联的“关键字”,我想计算给定的元组(x,y)与z相关联-而(x,y)可能与一个以上的键z相关联。因此,尽管这是分析地图(x,y)-> z的第一步,但我认为拥有一个数据框来告诉我(x,y)与z关联的频率会很方便。
例如:
亚当,德国,会计
奥地利伯特,运营
德国伯特HR
亚当,德国,人力资源
我想看看:
HR-> Bert,德国; 德国亚当
操作->奥地利伯特
会计->亚当,德国
答案 0 :(得分:0)
好吧。
require(data.table)
# your data
dt1 <- data.table(name = c("Adam", "Bert","Bert", "Adam"),
cntry = c("Germany", "Austria","Germany","Germany"),
occ = c("Accounting","Operations","HR", "HR"))
# make table
dt2 <- dcast(dt1, cntry + name ~ occ,value.var = "occ", fun.aggregate = length)
# make grand total
dt2[ , Total := rowSums(.SD), .SDcols = colnames(dt2)[-(1:2)] ]
# output
cntry name Accounting HR Operations Total
1: Austria Bert 0 0 1 1
2: Germany Adam 1 1 0 2
3: Germany Bert 0 1 0 1
答案 1 :(得分:0)
dt <- data.frame(name = c("Adam", "Bert","Bert", "Adam"),
+ cntry = c("Germany", "Austria","Germany","Germany"),
+ occ = c("Accounting","Operations","HR", "HR"))
dt$tuple<-paste(dt$name,",",dt$cntry,sep="")
dt$tuple1<-dt$tuple
dt<-reshape(dt[,3:5],idvar="occ",timevar="tuple1",direction="wide")
dt[is.na(dt)]<-""
dt<-data.frame(occ=dt[,1],tuples=apply(dt[,-1],1,paste,collapse="|"))
dt$tuples<-gsub("\\|+","|",dt$tuples)
dt
occ tuples
1 Accounting Adam,Germany|
2 Operations |Bert,Austria|
3 HR Adam,Germany|Bert,Germany