从数据框中删除行直到满足条件

时间:2017-07-17 22:59:58

标签: r loops conditional

我有一个函数remove_fun,它根据某些条件从数据框中删除行(这个函数太冗长了,所以这里有一个简化的例子:)。

假设我有一个名为block_2的数据框,有两列:

 Treatment seq
       1   29
       1   23
       3   60
       1   6
       2   41
       1   5
       2   44

为了这个例子,假设我的函数根据block_2中seq的最高值一次从block_2$seq中删除1行。当我运行一次时,此函数运行良好,即remove_fun(block_2)将返回以下输出:

Treatment seq
   1      29
   1      23
   1      6
   2      41
   1      5
   2      44

然而,我不知道的是如何重复实施remove_fun,直到我将block_2缩减到某个维度。

我的想法是做这样的事情:

while (dim(block_2_df)[1]>1)#The number of rows of block_2_df{
  remove_fun(block_2_df)
}

这理论上会减少block_2_df,直到只剩下对应于最低序号的观察值为止。

然而,这不起作用。我认为我的问题与我不知道如何迭代地使用我的'更新'block_2_df。我想要完成的是一些像这样的代码:

new_df_1<-remove_fun(block_2)
new_df_2<-remove_fun(new_df_1)
new_df_3<-remove_fun(new_df_2)

等...

我不一定在寻找这个问题的确切解决方案(因为我没有提供remove_fun),但我很欣赏一些洞察力:解决问题的一般方法。

编辑:这是我的实际代码,包含一些示例数据:

#Start from a block of 10*6 balls, with lambda*(wj) balls of each class
#Allocation ratios
class_1<-"a"
class_2<-"b"
class_3<-"c"

ratio_a<-3
ratio_b<-2
ratio_c<-1
#Min_set
min_set<-c(rep(class_1,ratio_a),rep(class_2,ratio_b),rep(class_3,ratio_c))
min_set_num<-ifelse(min_set=='a',1,ifelse(min_set=='b',2,3))

table_key <- table(min_set_num)

#Number of min_sets
lamb<-10
#Active urn
block_1<-matrix(0,lamb,length(min_set))
for (i in 1:lamb){
  block_1[i,]<-min_set
}

#Turn classes into a vector
block_1<-as.vector(block_1)
block_1<-ifelse(block_1=='a',1,ifelse(block_1=='b',2,3))
#Turn into a df w/ identifying numbers:
block_1_df<-data.frame(block_1,seq(1:length(block_1)))
#Enumerate all sampling outcome permutations
library('dplyr')
#Create inactive urn
#Sample from block_1 until min_set is achieved, store in block_2#####
#Random sample :
block_2<-sample(block_1,length(block_1),replace=F)

block_2_df<-block_1_df[sample(nrow(block_1_df), length(block_1)), ]
colnames(block_2_df)<-c('Treatment','seq')
#Generally:####

remove_fun<-function(dat){
  #For df
  min_set_obs_mat<-matrix(0,length(block_1),2)
  min_set_obs_df<-as.data.frame(min_set_obs_mat)
  colnames(min_set_obs_df)<-c('Treatment','seq')

  for (i in 1:length(block_1)){
    if ((sum(min_set_obs_df[,1]==1)<3) || (sum(min_set_obs_df[,1]==2)<2) || (sum(min_set_obs_df[,1]==3)<1)){
      min_set_obs_df[i,]<-dat[i,]
    }
  }
  #Get rid of empty rows in df:
  min_set_obs_df<-min_set_obs_df%>%filter(Treatment>0)

  #Return the sampled 'balls' which satisfy the minimum set into block_2_df (randomized block_!), ####
  #keeping the 'extra' balls in a new df: extra_df:####

  #Question: does the order of returning matter?####

  #Identify min_set
  outcome_df<-min_set_obs_df %>% group_by(Treatment) %>% do({
    head(., coalesce(table_key[as.character(.$Treatment[1])], 0L))
  })

  #This removes extra observations 'chronologically'
  #Identify extra balls
  #Extra_df is the 'inactive' urn####
  extra_df<-min_set_obs_df%>%filter(!(min_set_obs_df$seq%in%outcome_df$seq))
  #Question: is the number of pts equal to the block size? (lambda*W)?######

  #Return min_df back to block_2_df, remove extra_df from block_2_df:
  dat<-dat%>%filter(!(seq%in%extra_df$seq))

return(dat)
}

2 个答案:

答案 0 :(得分:1)

您的while循环不会重新定义block2_df。这应该有效:

while (dim(block_2_df)[1]>1) {
  block_2_df <- remove_fun(block_2_df)
}

答案 1 :(得分:0)

如果你需要的只是一种数据框子集的方法......

df <- data.frame(Treatment = c(1, 1, 3, 1, 2, 1, 2),
                  seq = c(29, 23, 60, 6, 41, 5, 44))

df
  Treatment seq
1         1  29
2         1  23
3         3  60
4         1   6
5         2  41
6         1   5
7         2  44

# Decide how many rows you want in output

n <- 6

# Find the top "n" values in the seq variable

head(sort(df$seq), n)
[1]  5  6 23 29 41 44


# Use them in the subset criteria

df[df$seq %in% head(sort(df$seq), n), ]
  Treatment seq
1         1  29
2         1  23
4         1   6
5         2  41
6         1   5
7         2  44