在任意间隔中选择n个随机行

时间:2015-06-11 15:37:26

标签: r

我需要从我规定的不同数字区间中选择随机行。以下主题非常相关,但在这种情况下,行是从级别中选择的:

selecting n random rows across all levels of a factor within a dataframe

使用相同的示例示例:

df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)'

我如何选择4行(或任何其他n),其中-1&lt; X1 <0和4行,其中0≤X1<2≤

3 个答案:

答案 0 :(得分:2)

试试这个

 n <- 4
 indx1 <- with(df, which(X1>-1 & X1 <0))
 indx2 <- with(df, which(X1>=0 & X1 <2))
 df[sample(indx1,n,replace=FALSE),]
 df[sample(indx2,n,replace=FALSE),]

更新

如果你需要根据'X1'变量中的条件选择每个分组变量'color'的'n'行样本

library(data.table)#v1.9.5+
setDT(df)[between(X1, -1,0), if(n > .N) .SD  else 
           .SD[sample(.N, n, replace=FALSE)] , by = color]

您可以类似地使用“X1”的第二个条件

答案 1 :(得分:0)

set.seed(1234)
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)

s1 =  subset(df,df$X1<0 & df$X1 > -1)
s2 =  subset(df,df$X1<2 & df$X1 >= 0)

r1 = s1[sample(nrow(s1), 4), ]
r2 = s2[sample(nrow(s2), 4), ]

> r1
           X1         X2  color
18 -0.9111954 -0.7733534    red
22 -0.4906859  2.5489911 yellow
17 -0.5110095  1.6478175    red
11 -0.4771927 -1.8060313    red
> r2
          X1           X2 color
2  0.2774292 -1.068642724  blue
15 0.9594941 -0.162309524   red
6  0.5060559 -0.968514318  blue
31 1.1022975  0.006892838  pink

答案 2 :(得分:0)

使用dplyr

library(dplyr)

df %>%
  filter(X1 > -1 & X1 < 0) %>%
  sample_n(4)


df %>%
  filter(X1 >= 0 & X1 < 2) %>%
  sample_n(4)

您可以通过执行以下操作来抽象数字来选择:

num_to_select <- 4

df %>%
  filter(X1 > -1 & X1 < 0) %>%
  sample_n(num_to_select)

同样,您可以对下部和上部截止点执行相同的操作:

num_to_select <- 4
lower_cutoff  <- -1
upper_cutoff  <- 0

df %>%
  filter(X1 > lower_cutoff & X1 < upper_cutoff) %>%
  sample_n(num_to_select)