按时间将数据帧拆分为组,并使用R将函数应用于多个列

时间:2014-08-08 15:21:11

标签: r ggplot2

数据框sg如下:

           time                B   C D
1  2014-08-04 00:00:04.0       red 0 0
2  2014-08-04 00:00:06.0       red 0 0
3  2014-08-04 00:00:06.0       red 1 0
4  2014-08-04 00:00:06.2       red 0 0
5  2014-08-04 00:00:06.5       red 0 0
6  2014-08-04 00:00:07.0       red 0 1
7  2014-08-04 00:00:07.7       red 0 0
8  2014-08-04 00:00:16.0       red 0 0
9  2014-08-04 00:00:17.0       red 1 0
10 2014-08-04 00:00:18.0       red 0 0
11 2014-08-04 00:00:22.0       red 0 0
12 2014-08-04 00:00:22.0       red 0 0
13 2014-08-04 00:00:22.2       red 0 0
14 2014-08-04 00:00:25.0       red 1 0
15 2014-08-04 00:00:27.0       red 1 0
16 2014-08-04 00:00:28.0       red 0 0
17 2014-08-04 00:00:29.0 red/amber 1 0
18 2014-08-04 00:00:29.0 red/amber 1 1
19 2014-08-04 00:00:30.0     green 0 0
20 2014-08-04 00:00:40.0     green 0 1
21 2014-08-04 00:00:42.4     green 0 0
22 2014-08-04 00:00:43.0     green 0 0
23 2014-08-04 00:00:50.0       red 1 0
24 2014-08-04 00:00:51.2       red 0 0
25 2014-08-04 00:00:52.0       red 0 1
26 2014-08-04 00:00:52.0       red 1 0
27 2014-08-04 00:00:52.2       red 1 0
28 2014-08-04 00:00:52.9       red 1 1
29 2014-08-04 00:00:53.0       red 0 0
30 2014-08-04 00:00:59.0       red 0 1
31 2014-08-04 00:01:02.0       red 0 1
32 2014-08-04 00:01:03.2       red 0 1
33 2014-08-04 00:01:04.0       red 1 1
34 2014-08-04 00:01:06.4       red 0 1
35 2014-08-04 00:01:07.5       red 1 1
36 2014-08-04 00:01:08.0       red 0 1
37 2014-08-04 00:01:08.2       red 0 1
38 2014-08-04 00:01:08.4       red 0 1
39 2014-08-04 00:01:11.0       red 0 1
40 2014-08-04 00:01:13.0       red 0 1
41 2014-08-04 00:01:14.0       red 0 1
42 2014-08-04 00:01:15.0 red/amber 0 1
43 2014-08-04 00:01:15.0 red/amber 0 1
44 2014-08-04 00:01:16.0     green 0 1
45 2014-08-04 00:01:21.0     green 0 0
46 2014-08-04 00:01:26.0     green 0 0
47 2014-08-04 00:01:31.0     amber 0 0
48 2014-08-04 00:01:31.0     amber 0 0
49 2014-08-04 00:01:34.0       red 0 0
50 2014-08-04 00:01:36.0       red 0 0

首先,我需要按时间间隔(例如10秒)将数据帧拆分为组。 其次,分别计算C和D列各组中值“1”的百分比。 最后,在图形中绘制C列和B列的百分比与时间。

我为单变量做了。 我的解决方案是:

percentage.occupied <- function(x) (NROW(subset(x,C==1)))/(NROW(x))

splitbytime <- ddply(selectstatus309, .(cut(time,"10 seconds")),percentage.occupied)
colnames(splitbytime)<-c("time","occupancy")

occupancy  <- ggplot(splitbytime, aes(x=(as.POSIXct(splitbytime$time)),y=occupancy)) +
                      geom_point(shape=1) +
                      geom_smooth()+
                      xlab("time") +
                      ylab("% occupancy") 

图形如下图所示,我将其绘制为C列。我需要的是在一个图形中分别绘制C和D的百分比。

我不确定我是否清楚地描述了我的问题(┬_┬)

enter image description here

我采用了BrodieG的解决方案并将其应用到我的数据的一段时间(1小时)。我跟着每一步,但是错误地写了一些: enter image description here 此外,还有一个错误:

geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
Error in smooth.construct.cr.smooth.spec(object, data, knots) : 
  x has insufficient unique values to support 10 knots: reduce k.

我猜错误并不是奇怪情节的原因。 你可以看到熔化的df的一部分如下所示,我从中提到结果不可能只是1或0。

 time          B variable value
10520 2014-08-04 15:10:00      green     dt_5     0
10521 2014-08-04 15:10:00      green     dt_5     0
10522 2014-08-04 15:10:00      green     dt_5     0
10523 2014-08-04 15:10:00      green     dt_5     0
10524 2014-08-04 15:10:00      green     dt_5     0
10525 2014-08-04 15:10:00      green     dt_5     0
10526 2014-08-04 15:10:00      green     dt_5     0
10527 2014-08-04 15:10:00      green     dt_5     0
10528 2014-08-04 15:10:00      green     dt_5     1
10529 2014-08-04 15:10:00      amber     dt_5     1
10530 2014-08-04 15:10:00      amber     dt_5     1
10531 2014-08-04 15:10:00      amber     dt_5     1
10532 2014-08-04 15:10:00      amber     dt_5     1
10533 2014-08-04 15:10:00      amber     dt_5     1
10534 2014-08-04 15:10:00      amber     dt_5     1
10535 2014-08-04 15:10:00      amber     dt_5     0
10536 2014-08-04 15:10:00      amber     dt_5     0
10537 2014-08-04 15:10:00      amber     dt_5     0
10538 2014-08-04 15:10:00      amber     dt_5     0
10539 2014-08-04 15:10:00      amber     dt_5     0
10540 2014-08-04 15:10:00      amber     dt_5     0
10541 2014-08-04 15:10:00        red     dt_5     0
10542 2014-08-04 15:10:00        red     dt_5     0
10543 2014-08-04 15:10:00        red     dt_5     0
10544 2014-08-04 15:10:00        red     dt_5     0
10545 2014-08-04 15:10:00        red     dt_5     0

代码在这里:

selectstatus309.mlt <- melt(selectstatus309,id.var=c("time","B"))

percentage<-
  ggplot(selectstatus309.mlt, aes(x=time,y=value,color=variable))+
  stat_summary(geom="point", fun.y =mean,shape=1)+
  stat_smooth()+
  facet_wrap(~ B)

对于looooong和冗长的故事感到抱歉! T.T

1 个答案:

答案 0 :(得分:2)

这是一个选项。首先我们制作切割时间数据:

library(reshape2)
library(ggplot2)
df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs"))

然后我们将其融合,以便CD中的值位于同一列中,以便我们可以将其用作美学。 这是将两个图表放在同一图形中的关键步骤。检查df.mlt,看看它与df的区别。 ggplot喜欢长格式的数据,以使用它的内置数据分段工具。

df.mlt <- melt(df, id.var=c("time", "B"))

然后我们使用stat_summary绘制点(不需要诉诸ddply):

ggplot(df.mlt, aes(x=time, y=value, color=variable)) + 
  stat_summary(geom="point", fun.y=mean, shape=1) + 
  stat_smooth()

生成(在您的数据子集上):

enter image description here

请注意我是否能够根据数据来分割数据&#34; C&#34;或&#34; D&#34;。你甚至可以通过B:

来面对
ggplot(df.mlt, aes(x=time, y=value, color=variable)) + 
  stat_summary(geom="point", fun.y=mean, shape=1) + 
  stat_smooth() +
  facet_wrap(~ B)

enter image description here