我需要从我规定的不同数字区间中选择随机行。以下主题非常相关,但在这种情况下,行是从级别中选择的:
selecting n random rows across all levels of a factor within a dataframe
使用相同的示例示例:
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)'
我如何选择4行(或任何其他n),其中-1&lt; X1 <0和4行,其中0≤X1<2≤
答案 0 :(得分:2)
试试这个
n <- 4
indx1 <- with(df, which(X1>-1 & X1 <0))
indx2 <- with(df, which(X1>=0 & X1 <2))
df[sample(indx1,n,replace=FALSE),]
df[sample(indx2,n,replace=FALSE),]
如果你需要根据'X1'变量中的条件选择每个分组变量'color'的'n'行样本
library(data.table)#v1.9.5+
setDT(df)[between(X1, -1,0), if(n > .N) .SD else
.SD[sample(.N, n, replace=FALSE)] , by = color]
您可以类似地使用“X1”的第二个条件
答案 1 :(得分:0)
set.seed(1234)
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
s1 = subset(df,df$X1<0 & df$X1 > -1)
s2 = subset(df,df$X1<2 & df$X1 >= 0)
r1 = s1[sample(nrow(s1), 4), ]
r2 = s2[sample(nrow(s2), 4), ]
> r1
X1 X2 color
18 -0.9111954 -0.7733534 red
22 -0.4906859 2.5489911 yellow
17 -0.5110095 1.6478175 red
11 -0.4771927 -1.8060313 red
> r2
X1 X2 color
2 0.2774292 -1.068642724 blue
15 0.9594941 -0.162309524 red
6 0.5060559 -0.968514318 blue
31 1.1022975 0.006892838 pink
答案 2 :(得分:0)
使用dplyr
:
library(dplyr)
df %>%
filter(X1 > -1 & X1 < 0) %>%
sample_n(4)
df %>%
filter(X1 >= 0 & X1 < 2) %>%
sample_n(4)
您可以通过执行以下操作来抽象数字来选择:
num_to_select <- 4
df %>%
filter(X1 > -1 & X1 < 0) %>%
sample_n(num_to_select)
同样,您可以对下部和上部截止点执行相同的操作:
num_to_select <- 4
lower_cutoff <- -1
upper_cutoff <- 0
df %>%
filter(X1 > lower_cutoff & X1 < upper_cutoff) %>%
sample_n(num_to_select)