Question

我想在for循环中创建两个数据帧列表，但是我不能使用assign：

dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
                  x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
                  y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))

a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]

samp <- vector(mode = "list", length = 100)
h <- list(a,b)
hname <- c("a", "b")

for (j in 1:length(h)) {
  for (i in 1:100) {
    samp[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
    assign(paste("samp", hname[j], sep="_"), samp[[i]])
  }
}

我得到了包含第100个样本结果的向量，而不是名为samp_a和samp_b的列表。我想得到一个列表samp_a和samp_b，它们具有dat[dat$name == a,]和dat[dat$name == a,]的所有不同示例。

我该怎么办？

Answer 1

如何创建两个不同的列表并避免使用分配：

Option 1:

# create empty list
samp_a <-list()
samp_b <- list()

for (j in seq(h)) {

    # fill samp_a list
    if(j == 1){
        for (i in 1:100) {
            samp_a[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
        }
      # fill samp_b list
    } else if(j == 2){
        for (i in 1:100) {
            samp_b[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
        }
    }
}

您也可以使用分配，答案更短：

Option 2:

for (j in seq(hname)) {
    l = list()
    for (i in 1:100) {
        l[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
    }
    assign(paste0('samp_', hname[j]), l)
    rm(l)
}

Answer 2

您可以使用lapply函数轻松地使用rep。除非您要随机x，否则要与随机y配对。这将保持现有的配对顺序。

dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
              x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
              y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))

a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]

h <- list(a,b)
hname <- c("a", "b")

testfunc <- function(df) {
#df[sample(nrow(df), nrow(df)*0.5), ] #gives you the values in your data frame
sample(nrow(df), nrow(df)*0.5) # just gives you the indices
}

lapply(h, testfunc) # This gives you the standard lapply format, and only gives one a, and one b
samp <- lapply(rep(h, 100), testfunc) # This shows you how to replicate the function n times, giving you 100 a and 100 b data.frames in a list

samp_a <- samp[c(TRUE, FALSE)] # Applies a repeating T/F vector, selecting the odd data.frames, which in this case are the `a` frames.
samp_b <- samp[c(FALSE, TRUE)] # And here, the even data.frames, which are the `b` frames.

带列表R

2 个答案: