Question

I want to subset a large data frame by groups of 100 rows, to feed into a function.

A simplified example: Here's my "large" data frame of 1000 rows.

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))

I need to feed each group of 100 rows from df[,1] into this dummy function:

dummy<-function(x){
return(c("There are ",x," dummies in this room"))
}

I need to do this in sets of 100 because the dummy function can only handle 100 values at once.

This will feed the entirety of df[,1] into the function:

lapply(df[,1],dummy)

But instead, I need something like this:

lapply(df[1:100,1],dummy)
lapply(df[101:200,1]dummy)
. . . etc

How do I do this in a succinct way, preferably with base r?

Answer 1

如果您的数据集中没有因子变量，请使用split，或者您不想使用cut的向量路径，这样的短程序可能就足够了：

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),c(sample(2:100,1000,replace=TRUE)))
sample<-list()
div<-seq(100,nrow(df),100)
for(i in 1:length(div))
{
    sample[[i]]<-df[(100*(i-1)):div[i],]
}

Answer 2

正如@A Webb所建议的那样，使用split会有所帮助。

df<-data.frame(c(sample(2:100,1000,replace=TRUE)),
               c(sample(2:100,1000,replace=TRUE)))

# For sequential grouping
groups<-10 
split(df, factor(sort(rank(row.names(df))%%groups)))

# For Random sampling of 100
split(df, sample(1:groups, nrow(df), replace=T))

sapply(groups_split, yourfunc)

可能存在更有效的方式，希望看到新的答案。

Subset data frame in batches of 100 rows

2 个答案: