r data.table总结使用多个因素

时间:2015-04-23 08:33:11

标签: r dataframe data.table aggregate plyr

我有以下data.table

'data.frame':   66977 obs. of  16 variables:
 $ SUBS                         : int  
 $ CITY                         : Factor w/ 18 levels 
 $ VALUE_SEG                    : Factor w/ 7 levels 
 $ region                       : Factor w/ 5 levels 
 $ SUM.DATA_PPU_REV_DEC.        : num  
 $ SUM.DATA_BUNDLE_REV_DEC.     : int  
 $ SUM.DATA_USAGE_TOTAL_KB_DEC. : num  
 $ SUM.THIS_MONTH_REV_DEC.      : num  
 $ SUM.VOICE_ONNET_DURATION_DEC.: num  
 $ SUM.VOICE_ONNET_REV_DEC.     : num  
 $ SUM.VOICE_OFFNET_REV_DEC.    : num  
 $ SUM.SMS_ONNET_REV_DEC.       : num  
 $ SUM.SMS_OFFNET_REV_DEC.      : int  
 $ SUM.RECHARGE_DEC.            : int  
 $ STATUS_DEC                   : Factor w/ 5 levels 
 $ TYPE_DEC_2                   : Factor w/ 6 levels 

我想用两个因子变量对它进行分组,让我们说VALUE_SEG&区域,得到数字的总和,并为每个因子变量创建新的库存,并带有观察数量。我尝试使用varians类型的错误聚合,ddply和其他人:(提前感谢

2 个答案:

答案 0 :(得分:3)

以下是使用fruchterman_reingold_force_directed_layout( g, make_iterator_property_map(positions.begin(), boost::identity_property_map{}), topology, attractive_force([](Graph::edge_descriptor, double k, double d, Graph const&) { return (d*d)/k; }) );

的选项
data.table

答案 1 :(得分:1)

我建议您使用dplyr分隔数字和因子变量并进行汇总。它可能就像

library(dplyr)

data %>% select(VALUE_SEG,region,SUM..... all numeric variables) %>% 
   group_by(VALUE_SEG,region) %>% summarize_each(funs(sum)) -> summary1

## For factors

data %>% select(VALUE_SEG,region,SUM..... all factors variables) %>% 
   group_by(VALUE_SEG,region) %>% summarize_each(funs(n)) -> summary2

## Then you can merge these results

Summary <- merge(summary1,summary2,by="VALUE_SEG")

有关使用此套件的详细信息,请访问此link