标题不是最好的,但这是一个复杂的任务,用一句话来解释。我正致力于动态生成三个数据框列表,其中所有列表中的每个元素在列表的相同顺序中彼此对应。例如L1 [[1]],L2 [[1]],L3 [[1]],表示彼此相关的数据等。我已经在这上工作了大约35个小时而且无法计算以简化的方式使这项工作。
有两个主要的数据来源,我正在调用的数据" sort"和" ref"用于排序和参考。
这是参考数据的非常简化版本。实际数据来自多个.csv文件。此数据可以包含n个具有静态列名称的行,其中ID表示行的ID,每列表示一个变量。附加的R表示"右"缺少R代表"左":
R1 <- c(1,200,201,20,21,300,301,30,31,400,401,40,41)
R2 <- c(2,201,202,21,22,301,302,31,32,401,402,41,42)
R3 <- c(3,200,201,20,21,300,301,30,31,NA,NA,NA,NA)
R4 <- c(4,201,202,21,22,NA,NA,NA,NA,401,402,41,42)
ref <- data.frame(rbind(R1,R2,R3,R4))
colnames(ref) <- c("ID","H1","H1R","H2","H2R","U1","U1R","U2","U2R","R1","R1R","R2","R2R")
排序数据类似,但格式不同。它具有相应的列名称,但不附加R但在不同的侧列中单独列出:
R1 <- c(1,"left","H",1,200,20,NA,NA,NA,NA)
R2 <- c(2,"right","H",2,201,21,NA,NA,NA,NA)
R3 <- c(3,"left","R",2,NA,NA,NA,NA,400,40)
R4 <- c(4,"right","R",3,NA,NA,NA,NA,401,41)
R5 <- c(5,"left","U",2,NA,NA,300,30,NA,NA)
R6 <- c(6,"right","U",5,NA,NA,301,31,NA,NA)
sort <- data.frame(rbind(R1,R2,R3,R4,R5,R6))
colnames(sort) <- c("ID","Side","Element","Individual","H1","H2","U1","U2","R1","R2")
我尝试开发的代码将使用另外两个元素名称对象。第一个可以包含任意数量的元素,而第二个元素总是包含一个元素。这些对应于&#34;元素&#34;排序和参考数据中的变量。例如:
B1 <- c("H","U")
B2 <- "R"
第一项任务是使用以下代码将排序数据拆分为B1和B2:
sort1 <- array()
for (i in B1) {
sort1 <- rbind(sort1, sort[sort$Element == i,])
}
sort1 <- sort1[-1,] #removes first row from binding NA row
sort2 <- sort[sort$Element == B2,]
现在sort1和sort2。下一个任务是使用以下代码将sort1分组为相同的单个数字:
a1 <- list(data.frame())
j <- 1
for (i in unique(sort1$Individual)) {
a1[[j]] <- sort1[sort1$Individual == i,]
j <- j + 1
}
sort1 <- a1
现在,在这个阶段,我们有一个列表sort1,它包含具有相同个体编号的行的数据框。 Sort2,其中包含仅包含单个行的元素,ref包含所有变量的引用。现在我需要在sort1和sort2之间创建唯一的组合,其中sort1中的每个数据帧都与sort2组合,只要sort1中的side和element不在sort1中,使用以下代码:
a1 <- list(data.frame())
a2 <- list(data.frame())
x <- 1
for(i in 1:length(sort1)) {
for(j in 1:nrow(sort2)) {
if(sort1[[i]]$Element != sort2[j,]$Element || sort1[[i]]$Side != sort2[j,]$Side) {
a1[[x]] <- sort1[[i]][,colSums(is.na(sort1[[i]])) < nrow(sort1[[i]])] #removes NAs
a2[[x]] <- sort2[j,][,colSums(is.na(sort2[j,])) < nrow(sort2[j,])] #removes NAs
}
x <- x + 1
}
}
现在a1和a2都是包含相应组合的列表,其中列表的每个索引彼此对应。所以a1 [[1]]对应于a2 [[1]],依此类推。
所有这些代码都可以实现。现在我需要做的是创建一个参考数据列表,类似于我分割排序数据的方式。参考标准如下。引用的ID必须是每个组合的所有变量的相同(IE每个a1 [[1]],a2 [[2]]),这意味着每个唯一组合的参考数据中不能有任何NA。
例如,对于a1 [[4]]和a2 [[4]],我需要提取对应于可用侧和参考数据的参考数据。
a1[[4]]:
ID Side Element Individual H1 H2 U1 U2
2 right H 2 201 21 <NA> <NA>
5 left U 2 <NA> <NA> 300 30
a2[[4]]:
ID Side Element Individual R1 R2
4 right R 3 401 41
我需要a3 [[4]]看起来没有NA,并且每个变量都有可用于该单个ID的数据。此时,所有三个data.frames列表都有相应的数据。:
ID H1R H2R U1 U2 R1R R2R
1 201, 21, 300, 30, 401, 41
2 202, 22, 301, 31, 402, 42
以下代码是我对此的尝试,但它目前已被破坏且设计可怕:
zz <- 1
REF1 <- list(data.frame())
myfun <- function(x,y) {
names1 <- colnames(x[5:ncol(x)])
names2 <- colnames(y[5:ncol(y)])
names <- c(names1, names2)
IND1 <- data.frame()
IND2 <- data.frame()
for(n in names1) {
for(i in nrow(x)) {
if(x[i,]$Side == "left") {
if(!is.na(x[i,][[n]])) {
name <- n
IND1[zz] <- ref[[x[i,]$Element]][[name]]
}
}
if(x[i,]$Side == "right") {
if(!is.na(x[i,][[n]])) {
name <- paste(n, "R", sep="")
IND2[zz] <- ref[[x[i,]$Element]][[name]]
}
}
}
}
DEP1 <- data.frame()
DEP2 <- data.frame()
for(n in names2) {
if(y$Side == "left") {
if(!is.na(y[[n]])) {
name <- n
DEP1[zz] <- ref[[y$Element]][[name]]
}
}
if(y$Side == "right") {
if(!is.na(y[[n]])) {
name <- paste(n, "R", sep="")
DEP2[zz] <- ref[[y$Element]][[name]]
}
}
}#names
REF1[[zz]] <- cbind(IND1, IND2, DEP1, DEP2)
zz <- zz + 1
return(REF1)
}#myfun
output1 <- mapply(myfun, x = a1, y = a2)
非常感谢任何帮助。我试着简化这个问题。如果我需要澄清任何事情,请告诉我!最终目标是在各种统计测试中使用三个数据帧列表,其中每个列表的每个索引表示包括要使用的参考数据的单个组合。
EDITED:即使列名对于变量是静态的,我也不能直接在代码中指定它们,因为它们可能使用也可能不使用(ID,Side,Element和Individual除外)。实际上,我有多达185个不同的变量和完整的数据集。
已编辑:以下是排序数据的所需组合。不在任何特定的数据框架中,而只是视觉表示组合。
[[1]]
ID Side Element Individual H1 H2 ID Side Element Individual R1 R2
1 left H 1 200 20 3 left R 2 400 40
[[2]]
ID Side Element Individual H1 H2 ID Side Element Individual R1 R2
1 left H 1 200 20 4 right R 3 401 41
[[3]]
ID Side Element Individual H1 H2 U1 U2 ID Side Element Individual R1 R2
2 right H 2 201 21 <NA> <NA> 3 left R 2 400 40
5 left U 2 <NA> <NA> 300 30
[[4]]
ID Side Element Individual H1 H2 U1 U2 ID Side Element Individual R1 R2
2 right H 2 201 21 <NA> <NA> 4 right R 3 401 41
5 left U 2 <NA> <NA> 300 30
[[5]]
ID Side Element Individual U1 U2 ID Side Element Individual R1 R2
6 right U 5 301 31 3 left R 2 400 40
[[6]]
ID Side Element Individual U1 U2 ID Side Element Individual R1 R2
6 right U 5 301 31 4 right R 3 401 41
答案 0 :(得分:0)
sort
的数据集拆分为data.frames 基于B1,B2 sort1 data.frames列表再次拆分数据框
f <- function(x_tbl){
x_tbl %>% (function(x){
# find string occurances of "H|U" in Element column and filter
a <- x %>% filter(grepl(paste0(B1, collapse = "|"), Element))
# filter sort table where the element is "R"
b <- x[x$Element == B2,]
# combine into list
new_split <- list(a,b)
# naming just so I can follow allong here
names(new_split) <- c('sort1','sort2')
# Splitting the sort1 table again, now by the Individual column
new_split[['sort1']] <- split(
new_split[['sort1']], new_split[['sort1']][['Individual']])
# Now have the list of variable tables and key tables...
# not sure what you're trying to say after this point
return(new_split)
})
}
> f(sort)
$sort1
$sort1$`1`
ID Side Element Individual H1 H2 U1 U2 R1 R2
1 1 left H 1 200 20 <NA> <NA> <NA> <NA>
$sort1$`2`
ID Side Element Individual H1 H2 U1 U2 R1 R2
2 2 right H 2 201 21 <NA> <NA> <NA> <NA>
3 5 left U 2 <NA> <NA> 300 30 <NA> <NA>
$sort1$`5`
ID Side Element Individual H1 H2 U1 U2 R1 R2
4 6 right U 5 <NA> <NA> 301 31 <NA> <NA>
$sort2
ID Side Element Individual H1 H2 U1 U2 R1 R2
R3 3 left R 2 <NA> <NA> <NA> <NA> 400 40
R4 4 right R 3 <NA> <NA> <NA> <NA> 401 41
https://gist.github.com/CarlBoneri/edd9ad9c89fdbf81a5ad87532228a8b0
library(dplyr)
library(jsonlite)
#' Given the data frame `sort`,
#' and privided variables of: `B1` and `B2` wherby `B1` represents an array
#' of `Elements` to be matched and compared against from table `sort` given
#' the outlying variable input of `B2` to find all unique pair-values of
#' column vectors `H1:R2`
#'
#'
#' ## TARGET ELEMENTS
#' 1) Split the sort table into grouped tables, each returned item in the list
#' representing a unique `Element` variable from input `B1`
#'
# Setup by filtering matches of input `B1`
target_chunk <- sort[grepl(paste0(B1,collapse="|"),sort$Element),]
target_chunk
=== ===== ======= ========== === === === === === ===
ID Side Element Individual H1 H2 U1 U2 R1 R2
=== ===== ======= ========== === === === === === ===
1 left H 1 200 20 NA NA NA NA
2 right H 2 201 21 NA NA NA NA
5 left U 2 NA NA 300 30 NA NA
6 right U 5 NA NA 301 31 NA NA
=== ===== ======= ========== === === === === === ===
# Split on the individual
target_list_ind <- split(target_chunk, target_chunk$Individual)
target_list_ind
$`1`
ID Side Element Individual H1 H2 U1 U2 R1 R2
R1 1 left H 1 200 20 <NA> <NA> <NA> <NA>
$`2`
ID Side Element Individual H1 H2 U1 U2 R1 R2
R2 2 right H 2 201 21 <NA> <NA> <NA> <NA>
R5 5 left U 2 <NA> <NA> 300 30 <NA> <NA>
$`5`
ID Side Element Individual H1 H2 U1 U2 R1 R2
R6 6 right U 5 <NA> <NA> 301 31 <NA> <NA>
# Split each iteration of the `target_list_ind` on the Element
target_list_elm <- sapply(target_list_ind, function(i){
if(nrow(i)>1){
split(i, i[['Element']])
}else{
i
}
})
target_list_elm
> target_list_elm
$`1`
ID Side Element Individual H1 H2 U1 U2 R1 R2
R1 1 left H 1 200 20 <NA> <NA> <NA> <NA>
$`2`
$`2`$H
ID Side Element Individual H1 H2 U1 U2 R1 R2
R2 2 right H 2 201 21 <NA> <NA> <NA> <NA>
$`2`$U
ID Side Element Individual H1 H2 U1 U2 R1 R2
R5 5 left U 2 <NA> <NA> 300 30 <NA> <NA>
$`5`
ID Side Element Individual H1 H2 U1 U2 R1 R2
R6 6 right U 5 <NA> <NA> 301 31 <NA> <NA>
#' 2) Set up our inner key table that represents the `B2` variable
source_tbl <- sort[grepl(paste0(B2),sort$Element),]
#' 3) The big loop here.. not sure if the result is what you are referencing
#'
element_l_df <- lapply(target_list_elm, function(i){
if(is.data.frame(i)){
f_src <- source_tbl[i[['Side']] != source_tbl$Side,]
f_src <- f_src[i[['Element']] != f_src$Element,]
source_el <- f_src[['Element']]
target_el <- i
source_vals <- f_src %>% select(-c(ID, Individual,Element,Side))
target_vals <- i%>%select(-c(ID, Individual,Element,Side))
var_bound <- cbind(source_vals[!mapply(is.na, source_vals)],
target_vals[!mapply(is.na, target_vals)])
data.frame(individual = target_el[['Individual']],
source_element = source_el,
target_element = target_el[['Element']],
as.data.frame(var_bound))
}else{
ldply(1:length(i), function(x){
f_src <- source_tbl[i[[x]][['Side']] != source_tbl$Side,]
f_src <- f_src[i[[x]][['Element']] != f_src$Element,]
source_el <- f_src[['Element']]
target_el <- i[[x]]
target_vals <- i[[x]]%>%select(-c(ID, Individual,Element,Side))
source_vals <- f_src %>% select(-c(ID, Individual,Element,Side))
var_bound <- data.frame(source_vals[!mapply(is.na, source_vals)],
target_vals[!mapply(is.na, target_vals)])
data.frame(individual = target_el[['Individual']],
source_element = source_el,
target_element = target_el[['Element']],
as.data.frame(var_bound))
})
}
})
element_l_df
#' rbind.pages to put into 1 data frame
#'
rbind.pages(element_l_df)
========== ============== ============== === === === === === ===
individual source_element target_element R1 R2 H1 H2 U1 U2
========== ============== ============== === === === === === ===
1 R H 401 41 200 20 NA NA
2 R H 400 40 201 21 NA NA
2 R U 401 41 NA NA 300 30
5 R U 400 40 NA NA 301 31
========== ============== ============== === === === === === ===
A <- melt(rbind.pages(element_l_df), c("source_element","target_element", "individual"))
> head(A)
source_element target_element individual variable value
1 R H 1 R1 401
2 R H 2 R1 400
3 R U 2 R1 401
4 R U 5 R1 400
5 R H 1 R2 41
6 R H 2 R2 40