我有两个data.frames如下:
dt2017 = data.frame(id=LETTERS[1:5],year=2017,city1=c(0,1,0,1,0),city2=c(0,0,1,0,0),city3=c(1,0,1,0,1),city4=c(0,0,0,0,1))
dt2017
id year city1 city2 city3 city4
1: A 2017 0 0 1 0
2: B 2017 1 0 0 0
3: C 2017 0 1 1 0
4: D 2017 1 0 0 0
5: E 2017 0 0 1 1
dt2016 = data.frame(id=LETTERS[1:5],year=2016,city1=c(0,0,0,0,1),city2=c(0,0,0,1,0),city3=c(0,0,1,0,1),city4=c(1,1,0,0,1))
dt2016
id year city1 city2 city3 city4|
1: A 2016 0 0 0 1
2: B 2016 0 0 0 1
3: C 2016 0 0 1 0
4: D 2016 0 1 0 0
5: E 2016 1 0 1 1
data.frame中的“1”表示在城市中工作。例如,在2016年,A,B和E在同一个城市工作4。首先,我想获得以下data.frame:
id 2016 2017 2016+2017
1: A B;E C;E B;C;E
2: B A;E D A;D;E
3: C E A;E A;E
4: D NA B B
5: E A;B;C A;C A;B;C
其次,我想得到一个像这样的data.frame:
id relation
A B
A C
A E
B A
B D
B E
D B
E A
E B
E C
任何建议都将不胜感激。
答案 0 :(得分:1)
我找到了一种方法来实现你想要的东西,但它并不是很漂亮。它仍然可以做你想要的。
library(plyr)
dt2017$id <- as.character(dt2017$id)
dt2016$id <- as.character(dt2016$id)
id <- dt2017$id
my_function <- function(dt, x){
tmp <- data.frame(id=id, dt[,dt[dt$id==x,]==1])
tmp$ind <- sapply(1:nrow(tmp), function(x) return(sum(tmp[x, 2:ncol(tmp)])))
return(paste(tmp[tmp$ind > 0 & tmp$id !=x,"id"], collapse=";"))
}
results1 <- data.frame(Year2016 = sapply(id, function(x) return(my_function(dt2016,x))),
Year2017 = sapply(id, function(x) return(my_function(dt2017,x))))
my_function2 <- function(dt, x){
tmp <- data.frame(id=id, dt[,dt[dt$id==x,]==1])
tmp$ind <- sapply(1:nrow(tmp), function(x) return(sum(tmp[x, 2:ncol(tmp)])))
if(length(tmp[tmp$ind > 0 & tmp$id !=x,"id"])!=0){
return(data.frame(id=x, relation=tmp[tmp$ind > 0 & tmp$id !=x,"id"]))
}
}
results_tmp <- rbind(adply(.data=id, .margins=1, .fun= function(x) return(my_function2(dt2016,x))),
adply(.data=id, .margins=1, .fun= function(x) return(my_function2(dt2017,x))))[, c("id", "relation")]
results2 <- unique(results_tmp[order(as.character(results_tmp$id)),])
fun_tmp <- function(x) return(paste(x, collapse=";"))
bothyear_tmp <- aggregate(list(relation=results2$relation), by=list(id=results2 $"id"), FUN=fun_tmp)
results1$BothYear <- bothyear_tmp[order(as.character(bothyear_tmp$id)),"relation"]
以下是结果:
results1
Year2016 Year2017 BothYear
A B;E C;E B;E;C
B A;E D A;E;D
C E A;E E;A
D B B
E A;B;C A;C A;B;C
results2
id relation
A B
A E
A C
B A
B E
B D
C E
C A
D B
E A
E B
E C