How to intersect values from two data frames with R

时间:2015-10-30 21:27:26

标签: r merge dataframe transpose cbind

I would like to create a new column for a data frame with values from the intersection of a row and a column.

I have a data.frame called "time":

q   1    2   3   4    5
a   1    13  43  5    3
b   2    21  12  3353 34
c   3    21  312 123  343
d   4    123 213 123  35
e   4556 11  123 12   3

And another table, called "event":

q   dt
a   1
b   3
c   4
d   2
e   1

I want to put another column called inter on the second table that will be fill the values that are in the intersection between the q and the columns dt from the first data.frame. So the result would be this:

q   dt  inter
a   1   1
b   3   12
c   4   123
d   2   123
e   1   4556

I have tried to use merge(event, time, by.x = "q", by.y = "dt"), but it generate the error that they aren't the same id. I have also tried to transpose the time data.frame to cross section the values but I didn't have success.

2 个答案:

答案 0 :(得分:2)

library(reshape2)
merge(event, melt(time, id.vars = "q"), 
      by.x=c('q','dt'), by.y=c('q','variable'), all.x = TRUE)

Output:

  q dt value
1 a  1     1
2 b  3    12
3 c  4   123
4 d  2   123
5 e  1  4556

Notes

We use the function melt from the package reshape2 to convert the data frame time from wide to long format. And then we merge (left outer join) the data frames event and the melted time by two columns (q and dt in event, q and variable in the melted time) .

Data:

time <- structure(list(q = structure(1:5, .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), `1` = c(1L, 2L, 3L, 4L, 4556L), `2` = c(13L, 
21L, 21L, 123L, 11L), `3` = c(43L, 12L, 312L, 213L, 123L), `4` = c(5L, 
3353L, 123L, 123L, 12L), `5` = c(3L, 34L, 343L, 35L, 3L)), .Names = c("q", 
"1", "2", "3", "4", "5"), class = "data.frame", row.names = c(NA, 
-5L))

event <- structure(list(q = structure(1:5, .Label = c("a", "b", "c", "d", 
"e"), class = "factor"), dt = c(1L, 3L, 4L, 2L, 1L)), .Names = c("q", 
"dt"), class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:0)

This may be a little clunky but it works:

inter=c()
for (i in 1:nrow(time)) {
    xx=merge(time,event,by='q')
    dt=xx$dt
    z=y[i,dt[i]+1]
    inter=c(inter,z)
    final=cbind(time[,1],dt,inter)
}
colnames(final)=c('q','dt','inter')

Hope it helps.

Output:

     q dt inter
[1,] 1  1     1
[2,] 2  3    12
[3,] 3  4   123
[4,] 4  2   123
[5,] 5  1  4556