我有数月的数据,每天的读数为每秒。有几个缺失的值。数据位于R形式的数据框中:
日期价值
2015-01-01 100
2015-01-01 300
2015-01-01 350
2015-02-01 400
2015-02-01 50
在我的代码中,此数据框称为“合并”,并包含组合$ time(日期)和组合$ value(值)。我想按日绘制值,显示在五分位数中分箱的每个值范围的实例数(例如,值介于100和200之间的值,200到300之间的数字等,每天等)。我已经将bin边界的值定义为下限,上限等。在此图中,我希望点的大小与该日范围内值的实例数相对应。
(我制作了一个关于情节的示例图片,但我还没有足够的声望点来发布它!)
我当然没有写出最有效的方法来做到这一点,但我的主要问题是如何实际生成情节,因为我已经成功地按日分类了这些值。我也很乐意为更好的方法做任何建议。这是我到目前为止的代码:
lim<-c(lowlimit, midlowlimit, midupperlimit, uplimit)
bin <- c(0, 0, 0, 0)
for (i in 2:length(combined$values){
if (is.finite(combined$value[i])=='TRUE'){ # account for NA values
if (combined$time[i]==combined$time[i-1]){
if (combined$value[i] <= lowlimit){
bin[1]=bin[1]+1
i=i+1
}
else if (combined$value[i] > lowlimit && combined$value[i] <= midlowlimit){
bin[2]=bin[2]+1
i=i+1
}
else if (combined$value[i] > midlowlimit && combined$value[i] <= midupperlimit ){
bin[3]=bin[3]+1
i=i+1
}
else if (combined$value[i] > midupperlimit && combined$value[i] <= uplimit){
bin[4]=bin[4]+1
i=i+1
}
else if (combined$skin_temp[i] > uplimit ){
bin[5]=bin[5]+1
i=i+1
}
}
else{
### I know the plotting portion here is incorrect ###
for (j in 1:5){
ggplot(combined$date[i], lim[j]) + geom_point(aes(size=bin[j]))}
i = i+1}
}
}
我非常感谢您提供的任何帮助!
答案 0 :(得分:1)
这是我对你的尝试。我希望我能正确阅读你的问题。您似乎希望每天使用cut()
创建五个组。然后,您想要计算每个组中存在多少个数据点。您希望每天都执行此操作。我创建了一个示例数据来演示我做了什么。
mydf <- data.frame(Date = as.Date(c("2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01",
"2015-01-02", "2015-01-02", "2015-01-02", "2015-01-02"),
format = "%Y-%m-%d"),
Value = c(90, 300, 350, 430, 210, 330, 410, 500),
stringsAsFactors = FALSE)
### This is necessary later when you use left_join().
foo <- expand.grid(Date = as.Date(c("2015-01-01", "2015-01-02"), format = "%Y-%m-%d"),
group = c("a", "b", "c", "d", "e"))
library(dplyr)
library(ggplot2)
library(scales)
### You group your data by Date, and create five sub groups using cut().
### Then, you want to count how many data points exist for each date by
### group. This is done with count(). In this case, there are some subgroups
### which have no data points. They do not exist in the data frame that
### count() returns. So you want to use left_join() with foo. foo has all
### possible combination of Date and group. Once you join the two data frames,
### You want to replace NA with 0, which is done in the last mutate().
mutate(group_by(mydf, Date),
group = cut(Value, breaks = c(0, 100, 200, 300, 400, 500),
labels = c("a", "b", "c", "d", "e"))) %>%
count(Date, group) %>%
left_join(foo, ., by = c("Date" = "Date", "group" = "group")) %>%
rename(Total = n) %>%
mutate(Total = replace(Total, which(Total %in% NA), 0)) -> out
### Time to draw a figure
ggplot(data = out, aes(x = Date, y = Total, size = Total, color = group)) +
geom_point() +
scale_x_date(breaks = "1 day")
如果要修改y轴,可以使用scale_y_continuous()
。我希望这会对你有所帮助。