在线的起点和终点添加形状,并沿线的某个间隔添加形状,由分组变量定义

时间:2013-08-15 13:06:22

标签: r ggplot2

这是我的df(几乎100,000行和10个ID值)

               Date.time       P    ID
    1   2013-07-03 12:10:00 1114.3  J9335
    2   2013-07-03 12:20:00 1114.5  K0904
    3   2013-07-03 12:30:00 1114.3  K0904
    4   2013-07-03 12:40:00 1114.1  K1136
    5   2013-07-03 12:50:00 1114.1  K1148
............

使用ggplot我创建此图:

ggplot(df) + geom_line(aes(Date.time, P, group=ID, colour=ID)

enter image description here

此图表没问题。但是目前我还要用黑白打印它,颜色的分离并不是明智的选择。 我尝试使用行类型对ID进行分组,但结果并非如此。 所以我的想法是在每行的开头和末尾添加一个不同的符号:因此也可以在黑白纸上识别不同的ID。
我添加了这些行:

geom_point(data=df, aes(x=min(Date.time), y=P, shape=ID))+
geom_point(data=df, aes(x=max(Date.time), y=P, shape=ID)) 

但是发生错误.. 有什么建议吗?

鉴于每条线由大约5000或10000个值组成,因此无法将值绘制为不同的字符。解决方案可以是绘制线条,然后将点绘制为每个ID划分为中断的不同符号(例如,每500个值一个字符)。有可能吗?

2 个答案:

答案 0 :(得分:3)

如何使用geom_point个数据添加subset仅使用最小 - 最大时间值?

# some data
df <- data.frame(
  ID = rep(c("a", "b"), each = 4),
  Date.time = rep(seq(Sys.time(), by = "hour", length.out = 4), 2),
  P = sample(1:10, 8))
df

# create a subset with min and max time values
# if min(x) and max(x) is the same for each ID:
df_minmax <- subset(x= df, subset = Date.time == min(Date.time) | Date.time == max(Date.time))

# if min(x) and max(x) may differ between ID,
# calculate min and max values *per* ID
# Here I use ddply, but several other aggregating functions in base R will do as well.
library(plyr)
df_minmax <- ddply(.data = df, .variables = .(ID), subset,
             Date.time == min(Date.time) | Date.time == max(Date.time))


gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
  geom_line(aes(group = ID, colour = ID)) +
  geom_point(data = df_minmax, aes(shape = ID))

gg

如果您希望控制shape,可以查看?scale_shape_discreteexamples here)。

修改以下更新的问题
对于每个ID,请以某个间隔向线条添加形状。

# create a slightly larger data set
df <- data.frame(
  ID = rep(c("a", "b"), each = 100),
  Date.time = rep(seq(Sys.time(), by = "day", length.out = 100), 2),
  P = c(sample(1:10, 100, replace = TRUE), sample(11:20, 100, replace = TRUE)))


# for each ID:
# create a time sequence from min(time) to max(time), by some time step
# e.g. a week
df_gap <- ddply(.data = df, .variables = .(ID), summarize,
             Date.time =
                  seq(from = min(Date.time), to = max(Date.time), by = "week"))

# add P from df to df_gap
df_gap <- merge(x = df_gap, y = df)


gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
    geom_line(aes(group = ID, colour = ID)) +
    geom_point(data = df_gap, aes(shape = ID)) +
    # if your gaps are not a multiple of the length of the data
    # you may wish to add the max points as well
    geom_point(data = df_minmax, aes(shape = ID))

gg

答案 1 :(得分:1)

错误源于单个数值min(Date.time)长度与向量P或ID不匹配的事实。另一个问题可能是你重新声明你的数据变量,即使你已经有ggplot(df)。

立即想到的解决方案是弄清楚最小和最大日期的行索引。如果它们共享相同的最小和最大时间戳而不是它的简单。使用which()函数来提供您需要的行号数组。

min.index <- which(df$Date.time == min(df$Date.time))
max.index <- which(df$Date.time == max(df$Date.time))

然后使用这些数组作为索引。

geom_point(aes(x=Date.time[min.index], y=P[min.index], shape=ID[min.index]))+
geom_point(aes(x=Date.time[max.index], y=P[max.index], shape=ID[max.index]))