
时间:2011-05-06 10:20:52

标签: r


> mydata
date  station  treatment  subject   par
A       a         0         R1      1.3    
A       a         0         R1      1.4    
A       a         1         R2      1.4   
A       a         1         R2      1.1    
A       b         0         R1      1.5    
A       b         0         R1      1.8     
A       b         1         R2      2.5     
A       b         1         R2      9.5    
B       a         0         R1      0.3    
B       a         0         R1      8.2    
B       a         1         R2      7.3    
B       a         1         R2      0.2    
B       b         0         R1      9.4    
B       b         0         R1      3.2    
B       b         1         R2      3.5    
B       b         1         R2      2.4 


date是2级A / B的因素; station是2级a / b的因素; treatment是2级0/1的因素;

subject是分配给治疗的R1至R20的重复(10至treatment 0,10至治疗1);


我需要做的是: 在10个相等的箱子中划分par并计算每个箱子中的数量。这必须在mydata的子集中完成,这些子集由日期站和主题组合定义。最终结果必须是daframe myres,如下所示:

> myres
    date  station  treatment  bin.centre  freq
    A       a         0         1.2        4 
    A       a         0         1.3        3    
    A       a         0         1.4        2 
    A       a         0         1.5        1    
    A       a         1         1.2        4    
    A       a         1         1.3        3    
    A       a         1         1.4        2     
    A       a         1         1.5        1    
    B       b         0         2.3        5   
    B       b         0         2.4        4    
    B       b         0         2.5        3    
    B       b         0         2.6        2   
    B       b         1         2.3        5   
    B       b         1         2.4        4   
    B       b         1         2.5        3   
    B       b         1         2.6        2


#define the number of bins

#define the width of each bins

#define the lower and upper boundaries of each bins
bins<-seq(from=min(par), to=max(par), by=bin.width)

#define the centre of each bins

#create a vector to store the frequency in each bins


 # this is the loop that counts the frequency of particles between the lower and upper boundaries
 of each bins and store the result in freq

 for(i in 1:10){
    freq[i]<-length(which(par>=bins[i] &

 #create the data frame with the results

我的第一种方法是使用subset()为主题站和日期的每个组合手动对mydata进行子集,并为每个子集应用上述命令序列,然后构建组合每个单{{1}的最终数据帧使用res,但此过程非常错综复杂,并且会受到错误传播的影响。 我想做的是自动执行上述程序,以便计算每个主题的分箱频率分布。我的直觉是,最好的方法是创建一个估算这个粒子分布的函数,然后通过for循环将它应用于每个主题。但是,我不知道该怎么做。任何建议都会非常感激。

感谢 利玛。

1 个答案:

答案 0 :(得分:4)



n <- 100
dat <- data.frame(
    date=sample(LETTERS[1:2], n, replace=TRUE),
    station=sample(letters[1:2], n, replace=TRUE),
    treatment=sample(0:1, n, replace=TRUE),
    subject=paste("R", sample(1:2, n, replace=TRUE), sep=""),
    par=runif(n, 0, 5)

  date station treatment subject       par
1    A       b         0      R2 3.2943880
2    A       a         0      R1 0.9253498
3    B       a         1      R1 4.7718907
4    B       b         0      R1 4.4892425
5    A       b         0      R1 4.7184853
6    B       a         1      R2 3.6184538


dat$bin <- cut(dat$par, breaks=10)


res <- ddply(dat, .(date, station, treatment, bin), 
  summarise, freq=length(treatment))

  date station treatment             bin freq
1    A       a         0 (0.00422,0.501]    1
2    A       a         0   (0.501,0.998]    2
3    A       a         0      (1.5,1.99]    4
4    A       a         0     (1.99,2.49]    2
5    A       a         0     (2.49,2.99]    2
6    A       a         0     (2.99,3.48]    1