匹配来自3个不同数据帧的数据

时间:2013-08-27 20:33:34

标签: r dataframe match

我有3个数据框如下:

d1 <- data.frame(level1 =  c("A", "A", "B", "C", "C"), 
             level2 = c("AA", "AB", "BA", "CA", "CB"))

d2 <- data.frame(level1 =  c("A", "A", "A","B", "B", "C", "C"), 
             level3 = c("1", "2", "4", "2", "3", "1", "5"))

d3<- data.frame(level3 = c("1", "2", "3", "4", "5"), AA = c("v1", "v2", "v3", "v4", "v5"), 
            AB = c("v6", "v7", "v8", "v9", "v10"), BA = c("v11", "v12", "v13", "v14", "v15"), 
            CA = c("v16", "v17", "v18", "v19", "v20"),  CB = c("v21", "v22", "v23", "v24", "v25"))

我希望将这三个数据框作为输出:

A <- data.frame(level3 = c("1", "2", "4"), AA = c("v1", "v2", "v4"), AB = c("v6", "v7", "v9"))

B <- data.frame(level3 = c( "2", "3"), BA = c("v12", "v13"))

C <- data.frame(level3 = c("1", "5"), CA = c("v16", "v20"), CB = c("v21", "25"))

从提供的3个数据帧(d1,d2和d3),我想为每个“Level1”(A,B,C ..)注明一个单独的数据帧。

这些输出数据框应包含遵循d1标准的列。行应包含符合d2标准的level3数字。

例如,

根据d1,AA和AB与A匹配。因此数据帧A应包含这2列。

根据d2,1,2,4与A匹配,因此这些应该是数据框“A”中的行。

数据框“A”的值应基于d3。我希望我解释自己。谢谢,

关于如何做到这一点的任何想法?

在我的实际例子中,Level1和Level2命名法没有任何共同之处。

感谢您的支持,

2 个答案:

答案 0 :(得分:1)

使用reshape2 meltdcast以及mergesplit

library(reshape2)
# merge three data sets together (putting d3 in long form)
full <- merge(merge(d1,d2),melt(d3, id = 1, variable.name = 'level2'))
results <- lapply(split(full, full$level1, dcast, formula =level3~level2, value.var = 'value')

# the results are in a list, we can copy to the global environment using `list2env`
# if you want (but you may wish to stay as a list
list2env(results, .GlobalEnv)

答案 1 :(得分:0)

这有点笨拙,但我认为它符合你的要求:

# put d1 and d2 in a single table
dm <- merge(d1, d2)

# divide in individual dataframes based on level1 value
dspl <- split(dm, dm$level1)

# identify unique values for each level1 value
int1 <- lapply(dspl, apply, 2, unique)

# create a new dataframe:
int2 <- lapply(int1, function(x) d3[x[[3]],c("level3",x[[2]])]) 

# get the names of the level1 value to assign to objects
ndf <- names(int2)

# assign each dataframe to an object in the global environment
dmm <- lapply(ndf, function(lab) assign(lab, int2[[lab]], .GlobalEnv))