日期差异的直方图 - 分组

时间:2011-09-13 14:12:39

标签: r

前几天我问了一个关于如何得到日期差异直方图的问题。我想做同样的事情,但是对于团体和盒子情节,使用格子的bwplot。基本上,想要1个图像,其中包含5个不同来源的5个箱形图(我在下面的示例中显示了2个) - 类似于此image

我花了很多时间试图解决这个问题,但无法得到它。

我最接近的人

df <- read.csv("~/dates.csv", header = TRUE, sep = ",", quote = "\"")
a <- aggregate(as.POSIXct(as.character(df$REQUEST_DATE), format="%m/%d/%Y %H:%M:%S"), list(SOURCE=df$SOURCE), diff) # not sure if this is right (and I need -diff, but can't do that)
# now what?  I seem to know how to access a$SOURCE, but don't know how to look at the data associated with a$SOURCE.

数据(〜/ dates.csv):

"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:21:15"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:12:18"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:18"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:18"
"D","09/13/2011 08:55:43"
"D","09/13/2011 08:39:07"

1 个答案:

答案 0 :(得分:3)

以下是使用plyr包进行数据分析的解决方案,以及用于绘图的ggplot2包:

阅读数据。请注意使用stringsAsFactors=FALSE - 这可以节省大量麻烦转换为as.character以后:

df <- read.csv(textConnection('
"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:21:15"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:12:18"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:18"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:18"
"D","09/13/2011 08:55:43"
"D","09/13/2011 08:39:07"
'), stringsAsFactors=FALSE)

转换为POSIX日期格式:

df$REQUEST_DATE <- as.POSIXct(df$REQUEST_DATE, format="%m/%d/%Y %H:%M:%S")

通过SOURCE加载plyr并使用ddply到a)组,b)计算difftime,c)将结果分组到data.frame中,一步完成:

library(plyr)
df_diff <- ddply(df, .(SOURCE), summarize, TIME_DIFF=-unclass(diff(REQUEST_DATE)))
df_diff
  SOURCE TIME_DIFF
1      A      7.55
2      A      5.55
3      A      3.40
4      D     35.00
5      D    563.00
6      D     37.00
7      D     35.00
8      D    996.00

加载ggplot2并绘图。该图看起来有点垃圾 - 这是因为样本数据集很小。对于较大的数据集,它将更好地工作,即您将在中位数,范围和异常值之间得到明确的分离。

library(ggplot2)
ggplot(df_diff, aes(y=TIME_DIFF, x=SOURCE)) + geom_boxplot()

enter image description here