我有一个像
这样的数据框df<-data.frame(date=c(rep("1/27/2010",times=30)),
loc1=c(rep(9:13,each=6)),
loc2=c(rep(c("N","E","W"),each=2)),
loc3=c(rep(c(1,2))),
tr1=c(rep(c(0,1),each=15)),
tr2=c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1),
tr3=c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4),
Birth=c(sample(c("early","late"),30,replace=TRUE,prob=c(0.5,0.5))),
Species=c(rep(c("A","B"),times=15)),
Status=c(sample(c(0,1),30,replace=TRUE,prob=c(0.7,0.3))))
df<-rbind(df,df)
我想为loc3的每个值创建单独的列,其中的行由loc1,loc2,tr1,tr2,tr3,Birth和Species定义。我想“计算”共享这些值的所有观察的状态,并按loc3对计数进行分组。
我打算在reshape2包中使用dcast。
我写了一个函数来执行我想要的'计数'。我是R的新手,虽然我确定有一个功能可以做到这一点,但是我无法立即找到它,考虑到任务的简单性,尝试自己编写脚本似乎是值得的练习。
d.count<-function(x){
j=0
for (i in 1:length(x))
if (is.na(x{i])){
j<-j+0
}else if(x[i]==0){
j<-j+1
} else if(x[i]==1){
j<-j+0
}
return(j)
}
0应该增加计数,1s和NA不应该增加。
所以
df_1<-dcast(df,date+loc1+loc2+tr1+tr2+tr3+Birth+Species~loc3,value.var="Status",fun.aggregate=d.count)
我收到错误
Error in if (is.na(x[i])) { : argument is of length zero
这让我觉得我不明白dcast是如何对待fun.aggregate ...
感谢您的帮助! -JJE
答案 0 :(得分:2)
为什么不使用tabulate
函数
require(reshape2)
dcast(df, ... ~ loc3, value.var = "Status", fun.aggregate = tabulate)
## date loc1 loc2 tr1 tr2 tr3 Birth Species 1 2
## 1 1/27/2010 9 E 0 0 1 early A 0 0
## 2 1/27/2010 9 E 0 0 1 early B 0 0
## 3 1/27/2010 9 N 0 0 1 early B 0 0
## 4 1/27/2010 9 N 0 0 1 late A 0 0
## 5 1/27/2010 9 W 0 0 1 early B 0 0
## 6 1/27/2010 9 W 0 0 1 late A 0 0
## 7 1/27/2010 10 E 0 1 2 late A 0 0
## 8 1/27/2010 10 E 0 1 2 late B 0 2
## 9 1/27/2010 10 N 0 0 1 late A 0 0
## 10 1/27/2010 10 N 0 1 2 late B 0 2
## 11 1/27/2010 10 W 0 1 2 late A 0 0
## 12 1/27/2010 10 W 0 1 2 late B 0 0
## 13 1/27/2010 11 E 0 1 2 late A 0 0
## 14 1/27/2010 11 E 1 0 3 early B 0 2
## 15 1/27/2010 11 N 0 1 2 early B 0 0
## 16 1/27/2010 11 N 0 1 2 late A 0 0
## 17 1/27/2010 11 W 1 0 3 late A 0 0
## 18 1/27/2010 11 W 1 0 3 late B 0 2
## 19 1/27/2010 12 E 1 0 3 early B 0 0
## 20 1/27/2010 12 E 1 0 3 late A 0 0
## 21 1/27/2010 12 N 1 0 3 early A 2 0
## 22 1/27/2010 12 N 1 0 3 early B 0 2
## 23 1/27/2010 12 W 1 0 4 early A 0 0
## 24 1/27/2010 12 W 1 1 4 early B 0 0
## 25 1/27/2010 13 E 1 1 4 early B 0 0
## 26 1/27/2010 13 E 1 1 4 late A 0 0
## 27 1/27/2010 13 N 1 1 4 late A 0 0
## 28 1/27/2010 13 N 1 1 4 late B 0 2
## 29 1/27/2010 13 W 1 1 4 early A 0 0
## 30 1/27/2010 13 W 1 1 4 early B 0 2
编辑
如果要计算0的数字,例如:
dcast(df, ... ~ loc3, value.var = "Status",
fun.aggregate = function(x) sum(x == 0, na.rm = TRUE))