Question

我有以下R数据表：

> head(dt)
      X0   X1   X2  X3   X4  X5 X6 X7  X8 X9 X10 X11 X12 X13 X14 X15 grp
1: 33653 2325  916 720  867 187 31  0   6  3  42  56  92  15  69   0 a-4
2: 18895  414 1116 570 1190  55 92  0 122 23  78   6   4   2  11   0 a-3
3:  1383   70   27  17   17   1  0  0   0  0   1   0   0   0   3   0 a-6
4:   396   72   34   5   18   0  0  0   0  0   0   0   0   0   0   0 a-5
5:  3915 1170  402 832 2791 316 12  5 118 51  32   9  62  27   1   0 a-3
6:   554   33  138  13  415   4  5  0   0  0   0   0   0   0   0   0 a-5

我想创建一个新的数据框，该框架具有与grp列中的值相关的列式聚合。基于以上6条记录row2，row5应该加在一起，row4和row6，新数据表现在有4行而不是6。

我尝试使用ddply，如下所示：

> ddply(dt, numcolwise(sum))

但最终收到以下错误：

Error in UseMethod("as.quoted") : 
  no applicable method for 'as.quoted' applied to an object of class "function"

Answer 1

您可以使用data.table轻松完成此操作：

library(data.table)
options(stringsAsFactors=F)
##
dt <- data.table(
  matrix(rep(1,96),ncol=16))
dt[,grp:=c(
  "a-4","a-3","a-6",
  "a-5","a-3","a-5")]
##
> dt[,lapply(.SD,sum),by=grp]
   grp V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1: a-4  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1
2: a-3  2  2  2  2  2  2  2  2  2   2   2   2   2   2   2   2
3: a-6  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1
4: a-5  2  2  2  2  2  2  2  2  2   2   2   2   2   2   2   2

修改
这是我如何尝试可视化数据。我将使用稍微不同的数据集 - 相同的结构，不同的数字：

library(data.table) library(ggplot2) options(stringsAsFactors=F) ## dt <- data.table( matrix(1:96,ncol=16)) dt[,grp:=c( "a-4","a-3","a-6", "a-5","a-3","a-5")] ## gt <- dt[,lapply(.SD,sum),by=grp] > gt grp V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1: a-4 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 2: a-3 7 19 31 43 55 67 79 91 103 115 127 139 151 163 175 187 3: a-6 3 9 15 21 27 33 39 45 51 57 63 69 75 81 87 93 4: a-5 10 22 34 46 58 70 82 94 106 118 130 142 154 166 178 190

首先将data.table从 wide 格式重塑为 long 格式：

gt_long <- reshape( gt, direction="long", varying=list(names(gt)[-1]), v.names="Value", idvar="grp", timevar="V_ID", times=paste0("V",1:16)) > head(gt_long) grp V_ID Value 1: a-4 V1 1 2: a-3 V1 7 3: a-6 V1 3 4: a-5 V1 10 5: a-4 V2 7 6: a-3 V2 19

然后，您可以将Vi s视为因子变量，并且ggplot2有一些选项：

ggplot( data=gt_long, aes(x=V_ID,y=Value,color=grp))+ geom_point(size=5,alpha=.75)+ scale_colour_brewer(type="div",palette=4)

或者，如果这对你来说太杂乱了：

ggplot( data=gt_long, aes(x=V_ID,y=Value,color=grp))+ geom_point(size=4)+ facet_grid(grp ~ .)

编辑2 可能有一种更简洁的方式来正确地排序水平，但这是有效的。我制作了gt_long对象的副本，以便我可以检查它是否有效而无需修改原始对象，但您可以使用原始对象。

gt_long2 <- copy(gt_long) v_levels <- unique(gt_long2$V_ID) gt_long2[,V_ID:=factor( V_ID, levels=v_levels, labels=v_levels)]

我不打算添加这些情节，但我用gt_long2重新加注它们看起来不错。

Answer 2

如果你想根据grp变量得到行的总和，那么下面的代码将起作用，将x0和x1仅用于ex。目的

s<- ddply(dt, c("grp"), summarise,New_x0=sum(x0),New_x1=sum(x1))

R根据因子列的值聚合列并创建新的数据帧

2 个答案: