迭代地对data.table进行子集化

时间:2018-06-10 23:55:50

标签: r data.table

我的数据集有多个持续时间列,包含正值和负值,如下所示:

require(lubridate)
set.seed(43)
dt=data.table(A=as.duration(rnorm(100)*100), B=as.duration(rnorm(100)*100), C=as.duration(rnorm(100)*100))

我需要为每列生成摘要,但仅针对每个列的负值,如下所示:

a=dt[as.numeric(A)<=0, summary(as.numeric(A))]
b=dt[as.numeric(B)<=0, summary(as.numeric(B))]
c=dt[as.numeric(B)<=0, summary(as.numeric(B))]
results1=data.table(as.list(a),as.list(b),as.list(c))
results1
     V1               V2            V3 
-208.7355428    -237.0840684    -237.0840684 
-109.9255927    -90.91095008    -90.91095008 
-64.83885801    -72.52487746    -72.52487746 
-70.87867962    -74.7173011     -74.7173011 
-25.19085368    -38.76434599    -38.76434599 
-1.009403041    -1.733105648    -1.733105648

我的数据集有更多列,如A..C,因此为每个变量编写单独的摘要语句变得乏味。理想情况下,我希望通过迭代这些列中的每一列来完成此操作。我试过这个:

for (i in 1:3) {
    #col_name=paste("summary",i,sep="_")
    results[i] = dt[as.numeric(dt[[i]])<=0, .(summary(as.numeric(dt[[i]])))]   
               }
as.data.table(results)
         V1              V2        V3 
1:  -208.7355428    -237.084068 -294.9729 
2:  -64.0973647     -65.496083  -76.02132 
3:  0.1557914       8.544047    14.264934 
4:  6.2315259       9.399898    0.5640193 
5:  58.6766806      76.905693   77.687728 
6:  261.6953128     211.874016  272.82491

似乎在for循环中,摘要是对所有值(正数和负数)执行的,并且子设置被忽略。怎么了?有没有办法捕获每个值的标签(即行名称应为Min,1stQU,Median,Mean,3rdQu,Max)。谢谢。

0 个答案:

没有答案