直方图频率

时间:2019-06-19 21:08:49

标签: r histogram

我有大约1000个数据点的数据集(值从987到61,515),并且我试图用概率密度函数而不是频率创建直方图。

我尝试更改“ freq = FALSE”或“ prob = TRUE”,但在两种情况下,我都得到了错误的%

hist(data1000, freq = FALSE)

我希望我的Y轴显示密度(0.0到1.0),但实际y轴的值是0.00001到0.000015

我是否可以上传输出数据,我相信这将使我的问题更清楚?

这是我的数据集:

data= c(18124,12957,10232,19156,22015,14467,9812,36416,9530,6848,12029,20201,21787,19953,17698,14217,14771,22480,11063,31452,22196,2580,12973,8020,6632,19522,12047,7544,61515,12929,8623,10485,12612,10461,14014,8986,19864,12554,8071,14428,6924,11808,14238,16718,41123,23910,3615,13130,32555,14860,22347,8288,10390,18384,5542,41845,14156,5391,15015,14515,11571,20426,29791,10785,10820,18001,18291,11912,13037,24351,18694,13024,18185,12200,8025,10229,12218,6802,14127,10215,5582,3480,6691,25749,28012,24980,31255,10864,16890,5863,11369,1967,10232,14748,14943,8201,14804,15001,19112,21836,7309,9612,16788,13326,24983,16130,8633,22003,7272,12709,14404,23135,10758,43422,9859,6864,28675,21013,7879,6600,16426,15693,18225,13613,14643,32442,21591,23613,17259,17336,6127,20072,12419,5396,9371,8326,24437,10195,13930,35118,13303,10922,10452,11841,21410,9812,21312,7599,11719,11921,20493,19485,29040,13880,10618,22020,13143,8529,15380,9287,18536,25477,27116,14826,17309,18272,12793,19918,21231,10824,8421,30132,12006,17623,23309,25103,29187,12886,23328,23889,20766,14943,29909,23986,8476,17588,14565,13592,19408,36739,13488,11929,38903,7608,47485,10201,28221,30662,19382,27255,27029,22341,18261,50145,27973,49933,30022,37339,9482,9696,25198,9322,37734,10881,5165,11176,27707,29747,25769,18764,35669,40017,28801,23393,11792,25543,17552,45900,12135,11495,37428,19765,7205,24715,36810,13453,11273,12044,16910,23625,52021,17858,8571,15845,12432,26575,20768,28757,36219,19871,14319,15865,18824,13871,26157,16520,18385,43970,27882,15761,10565,30181,18972,10325,16724,25191,18755,23134,3517,13794,9422,11078,32387,15043,14587,18243,29831,22846,15758,14534,22022,19180,18598,18037,22183,29266,53410,19083,9519,20478,15904,14385,18483,13672,9530,11101,19891,9984,12445,8872,12720,8277,3878,6569,11947,19384,9258,7090,8456,9313,9752,14374,13182,31067,20905,9420,8137,9005,14460,21410,11236,47406,13247,15373,28414,23889,9384,21116,19878,32668,18491,15016,26640,28870,14505,18009,38628,11889,13065,19236,35277,13639,14950,25448,23388,6886,8888,23417,21360,24183,19521,14651,28611,15705,16157,16458,17386,17428,16370,14609,8791,14463,18153,5586,8806,14305,17216,16793,22897,16598,8837,32668,11741,11761,10826,15865,19805,10252,19258,17174,7874,7581,22427,8549,25789,20059,23891,19380,11138,17154,16622,12423,9652,13072,27632,20082,20308,11614,13287,6746,21413,16531,15557,12108,21136,22857,6950,17734,42772,20374,14177,24593,14897,8064,23842,15699,21295,24693,20505,12341,20239,16609,15061,20737,23763,15882,10423,16354,22338,18082,24631,12607,22930,9116,41550,11874,32281,18024,17641,19413,17550,6997,13388,6709,11070,10751,19738,8461,14106,11309,18259,14254,19457,15169,16567,15991,17634,27156,15566,11907,12449,15437,21896,12022,17617,10018,20314,11880,17745,22766,15548,19714,11118,13958,16392,8108,31388,13406,17098,20208,32396,13931,15951,11869,15222,29971,9054,21628,7601,13030,17674,47025,16801,12934,23975,14908,11504,11207,15692,9028,7502,6879,15785,15375,8033,15774,5761,14715,23413,17876,18937,12706,17326,14689,8155,14753,23087,10946,9761,17078,13083,14374,13550,26252,17484,14779,15608,17504,28028,15012,10773,13944,4210,17535,11707,20923,18299,8341,20755,21588,17056,5158,9001,11628,15787,21561,14259,11304,14782,8744,4616,3701,5557,8188,6139,8348,6600,17612,7674,4850,8757,4239,11920,9887,16467,9885,9617,6361,7134,11003,11455,10573,23016,9674,2270,9931,3479,8726,6219,1754,2186,8427,3174,3657,3061,20212,14538,12810,7103,2184,8806,7211,6077,11269,6294,19041,1568,10383,7847,3761,11171,10425,15267,15685,10930,2321,11362,11761,7240,5590,3610,12881,5156,15220,8425,7320,13014,7236,10219,4060,10886,8591,12144,7349,10934,9313,3477,6631,14469,18819,12401,3331,4569,4538,4029,7377,5588,10587,13074,10226,9568,15504,13134,9063,6828,7614,4201,5690,9125,7763,994,8226,7003,8582,7716,6593,7455,3874,3329,5442,4091,7079,10472,6246,10155,4844,5149,3161,5863,5843,11231,3451,6093,7652,8726,9032,7245,6664,10312,6325,3745,7369,5019,6658,8144,8150,14693,14622,11504,10095,2295,5942,3508,8959,5619,9515,8844,5453,4414,16870,2235,7809,6861,11506,9191,6673,5105,6748,6379,5245,16502,12031,4711,6286,13222,7670,5758,988,9725,8416,6157,10693,10748,6928,3581,3759,5231,3659,4018,6042,7059,4184,7650,9856,6569,10243,7765,3156,6791,4053,6673,6762,10939,7234,8558,7225,5612,6035,8204,5743,7907,8317,3827,4007,3878,6589,5734,15092,7346,12804,10235,8997,9344,7154,21202,4044,4507,7172,3801,6022,3785,5023,6100,13140,2804,5714,7103,5285,16365,5646,6334,8317,12188,9105,10312,5025,5548,13098,2941,3094,4777,4943,4917,10480,11570,18584,10354,8511,7787,5123,6983,8576,11825,17036,7863,6232,4888,6587,17324,10678,6567,7530,9674,7245,9657,2764,9707,3223,9599,7953,4031,7962,10534,13419,4219,11942,6917,6773,5123,4910,3067,5942,7249,9583,12064,10837,10177,7167,7087,12616,13267,9211,10394,10126,14166,17656,13198,13785,9389,7967,10912,3958,6569,8418,8751,5083,17725,22786,23665,12153,23147,16485,8150,6536,31840)

更新-2019年6月20日

当我尝试强制使用垃圾箱的数量(并因此获得更高的分配值)时,我收到一条错误消息。

这是命令-

hist(data, 
     breaks = c(5000, 10000, 15000, 20000, 25000, 30000, 
                35000, 40000, 45000, 50000, 55000, 60000), 
     freq = FALSE)

这是消息:

  

hist.default(CrOsAr865,breaks = c(5000,10000,15000,20000,:     一些“ x”不计算在内;也许“中断”没有跨越“ x”的范围

2 个答案:

答案 0 :(得分:0)

y轴上的密度值正确。密度函数下的面积应等于1,但这并不意味着您的y轴将为0-1。您的数据的标准偏差为8601.927,因此您的分布会被拉长,并且x轴上任何单个值的概率都非常小。为了说明这一点,我们可以生成具有不同SD的一些正态分布:

library(ggplot2)
library(tidyr)

tibble(`SD = 1` = rnorm(1000, 1000, 1),
       `SD = 10` = rnorm(1000, 1000, 10),
       `SD = 100` = rnorm(1000, 1000, 100),
       `SD = 1000` = rnorm(1000, 1000, 1000),
       ) %>% 
    gather(labs, vals) %>% 
    ggplot(aes(x = vals)) + 
    geom_density() + 
    facet_wrap(~ labs, scales = "free")

enter image description here

y比例减小,因为正在拉伸数据。这有点奇怪,因为在不同的y尺度上分布看起来是如此相似,但是如果我们在相同的尺度上观察它们,我们可以看到它们是如何逐渐伸展的(请注意,我已经更改了SD,以突出形状的变化) ):

enter image description here

答案 1 :(得分:0)

如前所述,密度图/直方图的面积等于1。通过捕获hist()函数的输出,可以轻松地在示例中看到这一点:

out <- hist(data, freq=FALSE)
sum(out$density)
# [1] 0.0002
sum(out$density) * 5000
# [1] 1

out是由6部分组成的列表。 out $ density包含y轴上显示的密度值。它们的总和仅为.0002。为了获得直方图的面积,我们需要将密度值乘以条的宽度。查看爆发$,可以看到每个条形图的宽度为5000个单位。当我们将密度乘以宽度时,总和为1。