Question

我有一个列表中随机抽样的数据帧行列表。我想将所有数据帧中前25％的行分配为T，将其他行分配为F.例如：

vec.1 <- c(1:574)
vec.2 <- c(3001:3574)
df.1 <- data.frame(vec.1, vec.2)
df.2 <- data.frame(vec.2, vec.1)

my_list <- replicate(10, df.1[sample(nrow(df.1)),] , simplify = FALSE)

在这个数据帧列表中，我想将前25％的行分配为F，将所有其他行分配为T.如何执行此操作？

Answer 1

您可以轻松编写以下函数，以便在lapply中使用：

myFun <- function(indf) {
  indf$vec.3 <- seq_len(nrow(indf)) <= .25*nrow(indf)
  indf
}

然后用法就是lapply(my_list, myFun)。

Answer 2

猜测这是交叉验证方法的开始，您可以使用modelr - 包

执行以下操作

require(modelr)
dat <- crossv_mc(df.1, 10, test = 0.25)

现在dat看起来如下：

# A tibble: 10 × 3
            train           test   .id
           <list>         <list> <chr>
1  <S3: resample> <S3: resample>    01
2  <S3: resample> <S3: resample>    02
...
10 <S3: resample> <S3: resample>    10

因此，您有一个列列保留75％的数据，另一个列测试保存测试数据。这相当于您的FALSE / TRUE拆分。

您可以按照以下方式使用此方法（采用?crossv_mc）

中的示例

指定一个包含模型的新列：

dat$mod <- lapply(dat$train, function(x){
  lm(vec.1 ~ vec.2, data = as.data.frame(x))
})

重要的部分是：as.data.frame(x)。如果要访问数据，请使用它。请参阅?resample。

使用测试数据在模型上运行一些统计信息：

mapply(rmse, dat$mod, dat$test)

Answer 3

Adapted from here from answer of #SirSaleh.

sensitivity.rand <- function(vector, threshold){
  num_to_thres <- floor(threshold*0.01*length(vector))
  l = length (vector)
  score = c(rep("T",num_to_thres),rep("F",l-num_to_thres))
  return(score)
}

And now it is suitable to take any threshold.

将前25％的行分配为T，将其他行分配为F的列表数据帧

3 个答案: