我有3个数据框如下:
d1 <- data.frame(level1 = c("A", "A", "B", "C", "C"),
level2 = c("AA", "AB", "BA", "CA", "CB"))
d2 <- data.frame(level1 = c("A", "A", "A","B", "B", "C", "C"),
level3 = c("1", "2", "4", "2", "3", "1", "5"))
d3<- data.frame(level3 = c("1", "2", "3", "4", "5"), AA = c("v1", "v2", "v3", "v4", "v5"),
AB = c("v6", "v7", "v8", "v9", "v10"), BA = c("v11", "v12", "v13", "v14", "v15"),
CA = c("v16", "v17", "v18", "v19", "v20"), CB = c("v21", "v22", "v23", "v24", "v25"))
我希望将这三个数据框作为输出:
A <- data.frame(level3 = c("1", "2", "4"), AA = c("v1", "v2", "v4"), AB = c("v6", "v7", "v9"))
B <- data.frame(level3 = c( "2", "3"), BA = c("v12", "v13"))
C <- data.frame(level3 = c("1", "5"), CA = c("v16", "v20"), CB = c("v21", "25"))
从提供的3个数据帧(d1,d2和d3),我想为每个“Level1”(A,B,C ..)注明一个单独的数据帧。
这些输出数据框应包含遵循d1标准的列。行应包含符合d2标准的level3数字。
例如,
根据d1,AA和AB与A匹配。因此数据帧A应包含这2列。
根据d2,1,2,4与A匹配,因此这些应该是数据框“A”中的行。
数据框“A”的值应基于d3。我希望我解释自己。谢谢,
关于如何做到这一点的任何想法?
在我的实际例子中,Level1和Level2命名法没有任何共同之处。
感谢您的支持,
答案 0 :(得分:1)
使用reshape2
melt
和dcast
以及merge
和split
library(reshape2)
# merge three data sets together (putting d3 in long form)
full <- merge(merge(d1,d2),melt(d3, id = 1, variable.name = 'level2'))
results <- lapply(split(full, full$level1, dcast, formula =level3~level2, value.var = 'value')
# the results are in a list, we can copy to the global environment using `list2env`
# if you want (but you may wish to stay as a list
list2env(results, .GlobalEnv)
答案 1 :(得分:0)
这有点笨拙,但我认为它符合你的要求:
# put d1 and d2 in a single table
dm <- merge(d1, d2)
# divide in individual dataframes based on level1 value
dspl <- split(dm, dm$level1)
# identify unique values for each level1 value
int1 <- lapply(dspl, apply, 2, unique)
# create a new dataframe:
int2 <- lapply(int1, function(x) d3[x[[3]],c("level3",x[[2]])])
# get the names of the level1 value to assign to objects
ndf <- names(int2)
# assign each dataframe to an object in the global environment
dmm <- lapply(ndf, function(lab) assign(lab, int2[[lab]], .GlobalEnv))