I want to subset a large data frame by groups of 100 rows, to feed into a function.
A simplified example: Here's my "large" data frame of 1000 rows.
df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))
I need to feed each group of 100 rows from df[,1] into this dummy function:
dummy<-function(x){
return(c("There are ",x," dummies in this room"))
}
I need to do this in sets of 100 because the dummy function can only handle 100 values at once.
This will feed the entirety of df[,1] into the function:
lapply(df[,1],dummy)
But instead, I need something like this:
lapply(df[1:100,1],dummy)
lapply(df[101:200,1]dummy)
. . . etc
How do I do this in a succinct way, preferably with base r?
答案 0 :(得分:3)
如果您的数据集中没有因子变量,请使用split
,或者您不想使用cut
的向量路径,这样的短程序可能就足够了:
df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))
sample<-list()
div<-seq(100,nrow(df),100)
for(i in 1:length(div))
{
sample[[i]]<-df[(100*(i-1)):div[i],]
}
答案 1 :(得分:0)
正如@A Webb所建议的那样,使用split
会有所帮助。
df<-data.frame(c(sample(2:100,1000,replace=TRUE)),
c(sample(2:100,1000,replace=TRUE)))
# For sequential grouping
groups<-10
split(df, factor(sort(rank(row.names(df))%%groups)))
# For Random sampling of 100
split(df, sample(1:groups, nrow(df), replace=T))
sapply(groups_split, yourfunc)
可能存在更有效的方式,希望看到新的答案。