这是我的df(几乎100,000行和10个ID值)
Date.time P ID
1 2013-07-03 12:10:00 1114.3 J9335
2 2013-07-03 12:20:00 1114.5 K0904
3 2013-07-03 12:30:00 1114.3 K0904
4 2013-07-03 12:40:00 1114.1 K1136
5 2013-07-03 12:50:00 1114.1 K1148
............
使用ggplot我创建此图:
ggplot(df) + geom_line(aes(Date.time, P, group=ID, colour=ID)
此图表没问题。但是目前我还要用黑白打印它,颜色的分离并不是明智的选择。
我尝试使用行类型对ID进行分组,但结果并非如此。
所以我的想法是在每行的开头和末尾添加一个不同的符号:因此也可以在黑白纸上识别不同的ID。
我添加了这些行:
geom_point(data=df, aes(x=min(Date.time), y=P, shape=ID))+
geom_point(data=df, aes(x=max(Date.time), y=P, shape=ID))
但是发生错误.. 有什么建议吗?
鉴于每条线由大约5000或10000个值组成,因此无法将值绘制为不同的字符。解决方案可以是绘制线条,然后将点绘制为每个ID划分为中断的不同符号(例如,每500个值一个字符)。有可能吗?
答案 0 :(得分:3)
如何使用geom_point
个数据添加subset
仅使用最小 - 最大时间值?
# some data
df <- data.frame(
ID = rep(c("a", "b"), each = 4),
Date.time = rep(seq(Sys.time(), by = "hour", length.out = 4), 2),
P = sample(1:10, 8))
df
# create a subset with min and max time values
# if min(x) and max(x) is the same for each ID:
df_minmax <- subset(x= df, subset = Date.time == min(Date.time) | Date.time == max(Date.time))
# if min(x) and max(x) may differ between ID,
# calculate min and max values *per* ID
# Here I use ddply, but several other aggregating functions in base R will do as well.
library(plyr)
df_minmax <- ddply(.data = df, .variables = .(ID), subset,
Date.time == min(Date.time) | Date.time == max(Date.time))
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_minmax, aes(shape = ID))
gg
如果您希望控制shape
,可以查看?scale_shape_discrete
(examples here)。
修改以下更新的问题
对于每个ID,请以某个间隔向线条添加形状。
# create a slightly larger data set
df <- data.frame(
ID = rep(c("a", "b"), each = 100),
Date.time = rep(seq(Sys.time(), by = "day", length.out = 100), 2),
P = c(sample(1:10, 100, replace = TRUE), sample(11:20, 100, replace = TRUE)))
# for each ID:
# create a time sequence from min(time) to max(time), by some time step
# e.g. a week
df_gap <- ddply(.data = df, .variables = .(ID), summarize,
Date.time =
seq(from = min(Date.time), to = max(Date.time), by = "week"))
# add P from df to df_gap
df_gap <- merge(x = df_gap, y = df)
gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
geom_line(aes(group = ID, colour = ID)) +
geom_point(data = df_gap, aes(shape = ID)) +
# if your gaps are not a multiple of the length of the data
# you may wish to add the max points as well
geom_point(data = df_minmax, aes(shape = ID))
gg
答案 1 :(得分:1)
错误源于单个数值min(Date.time)长度与向量P或ID不匹配的事实。另一个问题可能是你重新声明你的数据变量,即使你已经有ggplot(df)。
立即想到的解决方案是弄清楚最小和最大日期的行索引。如果它们共享相同的最小和最大时间戳而不是它的简单。使用which()函数来提供您需要的行号数组。
min.index <- which(df$Date.time == min(df$Date.time))
max.index <- which(df$Date.time == max(df$Date.time))
然后使用这些数组作为索引。
geom_point(aes(x=Date.time[min.index], y=P[min.index], shape=ID[min.index]))+
geom_point(aes(x=Date.time[max.index], y=P[max.index], shape=ID[max.index]))