我的数据集有多个持续时间列,包含正值和负值,如下所示:
require(lubridate)
set.seed(43)
dt=data.table(A=as.duration(rnorm(100)*100), B=as.duration(rnorm(100)*100), C=as.duration(rnorm(100)*100))
我需要为每列生成摘要,但仅针对每个列的负值,如下所示:
a=dt[as.numeric(A)<=0, summary(as.numeric(A))]
b=dt[as.numeric(B)<=0, summary(as.numeric(B))]
c=dt[as.numeric(B)<=0, summary(as.numeric(B))]
results1=data.table(as.list(a),as.list(b),as.list(c))
results1
V1 V2 V3
-208.7355428 -237.0840684 -237.0840684
-109.9255927 -90.91095008 -90.91095008
-64.83885801 -72.52487746 -72.52487746
-70.87867962 -74.7173011 -74.7173011
-25.19085368 -38.76434599 -38.76434599
-1.009403041 -1.733105648 -1.733105648
我的数据集有更多列,如A..C,因此为每个变量编写单独的摘要语句变得乏味。理想情况下,我希望通过迭代这些列中的每一列来完成此操作。我试过这个:
for (i in 1:3) {
#col_name=paste("summary",i,sep="_")
results[i] = dt[as.numeric(dt[[i]])<=0, .(summary(as.numeric(dt[[i]])))]
}
as.data.table(results)
V1 V2 V3
1: -208.7355428 -237.084068 -294.9729
2: -64.0973647 -65.496083 -76.02132
3: 0.1557914 8.544047 14.264934
4: 6.2315259 9.399898 0.5640193
5: 58.6766806 76.905693 77.687728
6: 261.6953128 211.874016 272.82491
似乎在for循环中,摘要是对所有值(正数和负数)执行的,并且子设置被忽略。怎么了?有没有办法捕获每个值的标签(即行名称应为Min,1stQU,Median,Mean,3rdQu,Max)。谢谢。