了解R的密度

时间:2014-02-06 18:39:34

标签: r ggplot2 visualization density-plot

我正试图绕过density - 它产生了什么样的单位?

下面包含一些实际数据。假设log_sample$TS在几秒钟内。我们可以按事件绘制密度:

require(ggplot2) 
ggplot(log_sample, aes(x=TS)) + geom_density(aes(colour=event))

Initial Density

当您调整时间戳以使其达到数小时时,我对密度(events / second估计值)的直觉会发生变化:

 ggplot(log_sample, aes(x=TS/(60*60))) + geom_density(aes(colour=event))

Hours Density

我认为密度应该上升! events / hour - 但Y轴变小了。

来自R的density()出现的密度是什么(geom_density的内容


样本数据

    log_sample = structure(list(TS = c(5936781453, 5424106429, 3051226836, 3780602571, 
    4836845109, 5718264549, 879774681, 3693468059, 2007748562, 2504334226, 
    624948758, 5712390144, 3169326817, 2716096605, 1108085248, 5668904375, 
    6559186646, 21095572, 3875508209, 4315196759, 5253007933, 4702915059, 
    6498649004, 5606316102, 3886402298, 2552276252, 6055089961, 87782977, 
    1792383661, 1525444570, 2423674627, 2698516549, 770431980, 2249099432, 
    5560812828, 5140968169, 4938716355, 7446015137, 3697083581, 5000572471, 
    2748254652, 6697149589, 3718191398, 6123529413, 2459883463, 2521530177, 
    5570098130, 4360374786, 311727922, 6026773996, 4889601125, 3358303391, 
    1822623672, 7514080648, 2892349471, 6832359196, 5011293787, 443364160, 
    5220940964, 5254117874, 5337279943, 5208529127, 4180004131, 4053678140, 
    5911956363, 380893281, 2018033389, 842548954, 7497672544, 2724869215, 
    1958679125, 4069038129, 3397592985, 2328548539, 5049321404, 6783632939, 
    1657654904, 2707346266, 892475725, 5327372333, 1037573029, 3319817079, 
    5009282140, 7265205425, 108382115, 5125317279, 2767672973, 158006399, 
    3973921838, 1529684154, 2631744541, 2343000246, 584037151, 2811442843, 
    224371846, 6117606277, 6495065662, 4023007200, 3664433941, 5606111439
    ), event = c("c", "c", "b", "b", "b", "b", "c", "b", "c", "c", 
    "c", "c", "b", "b", "b", "c", "c", "c", "c", "b", "b", "b", "c", 
    "r", "c", "c", "c", "b", "c", "c", "c", "c", "b", "c", "b", "c", 
    "r", "c", "c", "c", "c", "b", "c", "b", "b", "b", "b", "c", "b", 
    "r", "b", "b", "b", "b", "c", "b", "c", "b", "r", "c", "c", "c", 
    "b", "b", "b", "b", "c", "c", "b", "c", "c", "c", "b", "c", "b", 
    "r", "b", "b", "c", "c", "c", "c", "c", "r", "c", "b", "b", "c", 
    "b", "c", "c", "b", "b", "c", "c", "c", "c", "b", "b", "b")), .Names = c("TS", 
    "event"), row.names = c(943411L, 610939L, 1419805L, 794230L, 
    5117419L, 5198213L, 4312722L, 1443299L, 3360370L, 3703742L, 1989592L, 
    2882113L, 2082613L, 2725174L, 39266L, 2553302L, 2920469L, 4938431L, 
    4093867L, 3444703L, 2521564L, 2465041L, 2918392L, 4854160L, 3429030L, 
    3380282L, 953508L, 1639160L, 4017713L, 2022520L, 4369194L, 2391770L, 
    26864L, 1390462L, 4523739L, 4820972L, 3478285L, 332872L, 791177L, 
    4805164L, 1408718L, 5232955L, 1771935L, 2259467L, 3376903L, 2385297L, 
    4852010L, 3771602L, 4619512L, 4221952L, 3472587L, 3734953L, 4018822L, 
    1308366L, 4057947L, 4573824L, 545463L, 3303167L, 4502527L, 3837677L, 
    4184887L, 4174426L, 1461097L, 147448L, 2566731L, 3300883L, 72689L, 
    3317772L, 4935292L, 4380180L, 4352184L, 148011L, 2750094L, 421915L, 
    4157011L, 2929336L, 4341175L, 1081379L, 2992396L, 4183930L, 4646073L, 
    120493L, 2166828L, 3609199L, 986390L, 2181468L, 1737202L, 342543L, 
    4425869L, 1691913L, 3056016L, 4366245L, 3633507L, 4710969L, 3295152L, 
    232484L, 1623483L, 803098L, 3420917L, 5192365L), class = "data.frame")

1 个答案:

答案 0 :(得分:2)

图中的y轴是内核密度,默认情况下标准化为1,因此如果增加点数,则值会下降,反之亦然。正如已经指出的那样,第一个图(秒)的y值远小于第二个图(小时)的y值。