xtabs可用于均值或中位数而不是总和吗?

时间:2015-09-10 06:55:41

标签: r

我的数据看起来像 data.frame':833233 obs。 22个变量:

 $ ProductId                      : num  105422 105422 143863 170645 397474 ...
 $ Brand                          : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Supplier                       : Factor w/ 788 levels "[00000] 武商量贩",..: 1 113 265 154 99 99 99 99 99 99 ...
 $ Mode.of.operations             : Factor w/ 3 levels "[1] Distribution",..: 1 1 1 3 2 2 2 2 2 2 ...
 $ Category                       : Factor w/ 27 levels "[01] Fuits and Vegetables",..: 5 5 9 1 22 22 22 22 22 22 ...
 $ Profit.margin                  : num  0 0 237.95 0 1.16 ...
 $ Profit.margin.percentage       : num  0 0 0.1 0 0.17 ...

我使用xtabs如下

xtabs(Profit.margin~Category+Mode.of.operations,wushang)

现在这给了我每个类别下每个类别的利润率总和。这样的操作

                                         Mode.of.operations
Category                           [1] Distribution [2] Reseller [4] Joint venture
  [01] Fuits and Vegetables                95103.75         0.00         331445.89
  [02] Livestocks                         282948.03     10982.10          91013.51
  [03] Fisheries                           21632.49         0.00         114708.34
  [04] Food category                       14236.32      5289.90         286585.22
  [05] Daily distribution category       1039396.38     53995.36         222966.99
  [06] Grains                             640183.46    150810.26          64068.74
  [07] seasoning spices                   251716.98    175242.57         156037.71
  [08] canned vegetables                   15938.47     51549.80              0.00
  [09] cigarette, wine and tea            810113.98    550314.93          43743.06
  [10] candy cookies                      605020.64     92855.09         626064.09

我也有兴趣找到均值,中位数而不是总和。有没有办法xtabs可以做到这一点?或者还有一些其他功能可以达到预期的效果。

我的数据有NA / #NA值,所以我希望其他函数在输出中给我0而不是NA,因为我必须稍后使用rowPerc并且它只是跳过输出中具有NA的那一行。

编辑1 tapply函数可以给出平均值和中位数,但其输出中包含NA。

> with(wushang, tapply(Profit.margin,list(Category,Mode.of.operations), mean))

输出

                                  [1] Distribution [2] Reseller [4] Joint venture
[01] Fuits and Vegetables              29.5904636           NA        43.2753480
[02] Livestocks                        47.9248018     9.076116        89.9342984
[03] Fisheries                         33.5908230           NA        45.7552214
[04] Food category                     13.9435064    13.324685        47.7403332
[05] Daily distribution category       27.8942724    58.563297        41.7854179
[06] Grains                            35.7464660    14.332851        27.0446349
[07] seasoning spices                  11.9870937     8.398877        34.4378084
[08] canned vegetables                  5.0566212     8.977673                NA
[09] cigarette, wine and tea           79.4540977    31.158132       146.2978595
[10] candy cookies                     18.8974463     9.113268        61.0555968

并在对其应用rowPerc后,跳过整行

> rowPerc(with(wushang, tapply(Profit.margin,list(Category,Mode.of.operations), mean)))


                                 [1] Distribution [2] Reseller [4] Joint venture  Total
[01] Fuits and Vegetables                                                        100.00
[02] Livestocks                             32.62         6.18             61.21 100.00
[03] Fisheries                                                                   100.00
[04] Food category                          18.59        17.76             63.65 100.00
[05] Daily distribution category            21.75        45.67             32.58 100.00
[06] Grains                                 46.35        18.58             35.07 100.00
[07] seasoning spices                       21.86        15.32             62.82 100.00
[08] canned vegetables                                                           100.00
[09] cigarette, wine and tea                30.93        12.13             56.95 100.00
[10] candy cookies                          21.22        10.23             68.55 100.00

我怎样才能让它发挥作用? 感谢。

0 个答案:

没有答案