Question

我想将data.frame子集化，以保留每个分类变量的99.5％。

我的数据使用了几分钟=分钟并且location = location

我想为每个位置取出最高0.5％的分钟数据。

新的子集将具有99.5百分位的位置1. 99.5百分位的位置2等。

谢谢！

Answer 1

这可能会解决您的问题，但如果您发布数据会非常有用。

library(plyr)

#add a column with information on where the 99.5% cutoff is
new.dataset1 <- ddply(your.dataset, "location", mutate, minutes.99.5.cutoff =                         
                      quantile(minutes.used, 0.95)) 

#subset the data to only include the bottom 99.5% of the data, then only 
#select the first two columns
trimmed.dataset <- new.dataset1[which(new.dataset1$minutes.used <= 
                                      new.dataset1$minutes.99.5.cutoff),1:2]

每个分类变量

1 个答案: