R - 帕累托就像直方图的摘要

时间:2015-04-06 09:55:25

标签: r

我想以表格格式生成直方图的摘要。使用plot = FALSE,我能够获得直方图对象。

 > hist(y,plot=FALSE)
$breaks
 [1] 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8

$counts
 [1]      48    1339   20454  893070 1045286   24284     518     171     148
[10]      94      42      42      37      25      18      21      14       5

$density
 [1] 0.00012086929 0.00337174962 0.05150542703 2.24884871999 2.63214538964
 [6] 0.06114978928 0.00130438111 0.00043059685 0.00037268032 0.00023670236
[11] 0.00010576063 0.00010576063 0.00009317008 0.00006295276 0.00004532598
[16] 0.00005288032 0.00003525354 0.00001259055

$mids
 [1] 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7

$xname
[1] "y"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

有没有办法总结这个对象,如帕累托图表摘要。 (以下摘要针对不同的数据,包括此示例)

Pareto chart analysis for counts
Frequency Cum.Freq.   Percentage Cum.Percent.
  c   2294652   2294652 33.689225770     33.68923
  f   1605467   3900119 23.570868362     57.26009
  g    896893   4797012 13.167848880     70.42794
  i    464220   5261232  6.815505091     77.24345
  b    365399   5626631  5.364651985     82.60810
  j    332239   5958870  4.877809219     87.48591
  h    215313   6174183  3.161145249     90.64705
  l    129871   6304054  1.906717637     92.55377
  e    107001   6411055  1.570948818     94.12472
  k    104954   6516009  1.540895526     95.66562
  d    103648   6619657  1.521721321     97.18734
  m     56172   6675829  0.824696377     98.01203
  o     51093   6726922  0.750128391     98.76216
  n     49320   6776242  0.724097865     99.48626
  p     32321   6808563  0.474524881     99.96079
  q      1334   6809897  0.019585291     99.98037
  r       620   6810517  0.009102609     99.98947
  s       247   6810764  0.003626362     99.99310
  u       182   6810946  0.002672056     99.99577
  t       162   6811108  0.002378424     99.99815
  z       126   6811234  0.001849885    100.00000

1 个答案:

答案 0 :(得分:1)

您可以编写一个包装函数,将hist输出的相关部分转换为data.frame:

myfun <- function(x) {
    h <- hist(x, plot = FALSE)
    data.frame(Frequency = h$counts,
               Cum.Freq = cumsum(h$counts),
               Percentage = h$density/sum(h$density),
               Cum.Percent = cumsum(h$density)/sum(h$density))
}

以下是内置iris数据集的示例:

myfun(iris$Sepal.Width)
#    Frequency Cum.Freq  Percentage Cum.Percent
# 1          4        4 0.026666667  0.02666667
# 2          7       11 0.046666667  0.07333333
# 3         13       24 0.086666667  0.16000000
# 4         23       47 0.153333333  0.31333333
# 5         36       83 0.240000000  0.55333333
# 6         24      107 0.160000000  0.71333333
# 7         18      125 0.120000000  0.83333333
# 8         10      135 0.066666667  0.90000000
# 9          9      144 0.060000000  0.96000000
# 10         3      147 0.020000000  0.98000000
# 11         2      149 0.013333333  0.99333333
# 12         1      150 0.006666667  1.00000000