前几天我问了一个关于如何得到日期差异直方图的问题。我想做同样的事情,但是对于团体和盒子情节,使用格子的bwplot。基本上,想要1个图像,其中包含5个不同来源的5个箱形图(我在下面的示例中显示了2个) - 类似于此。
我花了很多时间试图解决这个问题,但无法得到它。
我最接近的人
df <- read.csv("~/dates.csv", header = TRUE, sep = ",", quote = "\"")
a <- aggregate(as.POSIXct(as.character(df$REQUEST_DATE), format="%m/%d/%Y %H:%M:%S"), list(SOURCE=df$SOURCE), diff) # not sure if this is right (and I need -diff, but can't do that)
# now what? I seem to know how to access a$SOURCE, but don't know how to look at the data associated with a$SOURCE.
数据(〜/ dates.csv):
"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:21:15"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:12:18"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:18"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:18"
"D","09/13/2011 08:55:43"
"D","09/13/2011 08:39:07"
答案 0 :(得分:3)
以下是使用plyr
包进行数据分析的解决方案,以及用于绘图的ggplot2
包:
阅读数据。请注意使用stringsAsFactors=FALSE
- 这可以节省大量麻烦转换为as.character
以后:
df <- read.csv(textConnection('
"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:21:15"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:12:18"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:18"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:18"
"D","09/13/2011 08:55:43"
"D","09/13/2011 08:39:07"
'), stringsAsFactors=FALSE)
转换为POSIX日期格式:
df$REQUEST_DATE <- as.POSIXct(df$REQUEST_DATE, format="%m/%d/%Y %H:%M:%S")
通过SOURCE加载plyr
并使用ddply
到a)组,b)计算difftime,c)将结果分组到data.frame中,一步完成:
library(plyr)
df_diff <- ddply(df, .(SOURCE), summarize, TIME_DIFF=-unclass(diff(REQUEST_DATE)))
df_diff
SOURCE TIME_DIFF
1 A 7.55
2 A 5.55
3 A 3.40
4 D 35.00
5 D 563.00
6 D 37.00
7 D 35.00
8 D 996.00
加载ggplot2
并绘图。该图看起来有点垃圾 - 这是因为样本数据集很小。对于较大的数据集,它将更好地工作,即您将在中位数,范围和异常值之间得到明确的分离。
library(ggplot2)
ggplot(df_diff, aes(y=TIME_DIFF, x=SOURCE)) + geom_boxplot()