我想要一种有效的方法,通过以下原理将具有缺失值的数据帧绘制为R中的线图;
这是我的数据框的一个示例(已编辑)
df <- data.frame("time" = c(1,2,3,4,5),
"case1" = c(NA,2,3,4,NA),
"case2" = c(5,4,3,2,NA),
"case3" = c(4,NA,NA,NA,2))
这是仅在第一种情况下的工作方式
library(pracma)
df$case1.i <- with(df, interp1(time, case1, time, 'linear'))
library(ggplot2)
ggplot(df, aes(time)) + geom_point(aes(case1 = case1)) + geom_line(aes(case1 = case1.i))
我正在尝试制定一些措施,使其适用于我实际数据框中的大约200列。到目前为止,这段代码似乎无效
for (i in colnames(df)){
argument <- paste("df$case",i,".i <- with(df, interp1(time, case",i,", time, 'linear'))")
eval(parse(text=argument))
}
答案 0 :(得分:1)
将数据读取到新的Zoo对象z
中,对其应用na.approx
,以填充数据主体中的NA
值,然后使用ggplot2进行绘制。如果需要单独的面板,请省略facet = NULL
。请注意,fortify.zoo
和melt = TRUE
会将数据转换为带有Index
,Series
和Value
列的长格式,并在geom_point
中使用。如果只需要行,请省略geom_point(...)
部分。参见此答案末尾的图片。这里显示的方法相对紧凑,避免粘贴在一起然后评估代码。
library(ggplot2)
library(zoo)
z <- read.zoo(df)
autoplot(na.approx(z), facet = NULL) +
geom_point(aes(Index, Value, group = Series), fortify(z, melt = TRUE))
或者如果您想为每一列单独绘制图,请尝试以下方法:
pdf("civy.pdf")
for(i in 1:ncol(z)) {
p <- autoplot(na.approx(z[, i])) +
ylab(names(z)[i]) +
geom_point(aes(Index, Value), fortify(z[, i], melt = TRUE))
plot(p)
}
dev.off()
答案 1 :(得分:1)
这里有两种解决方案:一种是将所有数据绘制在一起,按颜色区分;另一种情况是,它们通过案例在不同的方面分别绘制。原理基本相同:我使用approx
进行线性插值,将数据从宽到长重新格式化以便于在ggplot2
中进行绘制,然后进行绘制。在第二个解决方案中,我还创建了一个名为type
的新变量,以区分插值数据和原始数据。
# Create data frame
df <- data.frame("time" = c(1,2,3,4,5),
"case1" = c(NA,2,3,4,NA),
"case2" = c(1,2,3,4,NA),
"case3" = c(1,NA,NA,NA,5))
# Perform interpolation on all columns
# Switch from wide to long format
df %<>%
mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>%
gather(var, val, -time)
# Plot results all in one figure
g <- ggplot()
g <- g + geom_point(data = df %>% filter(!grepl("interp", var)), aes(x = time, y = val, colour = var))
g <- g + geom_line(data = df %>% filter(grepl("interp", var)), aes(x = time, y = val, colour = var))
print(g)
# Create data frame
df <- data.frame("time" = c(1,2,3,4,5),
"case1" = c(NA,2,3,4,NA),
"case2" = c(1,2,3,4,NA),
"case3" = c(1,NA,NA,NA,5))
# Perform interpolation on all columns
# Switch from wide to long format
# Create column to indicate whether raw or interpolated
# Strip "_interp" from var
df %<>%
mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>%
gather(var, val, -time) %>%
mutate(type = ifelse(grepl("interp", var), "interp", "raw"),
var = gsub("_.*", "", var))
# Plot results all separate figures
g <- ggplot()
g <- g + geom_point(data = df %>% filter(type == "raw"), aes(x = time, y = val))
g <- g + geom_line(data = df %>% filter(type == "interp"), aes(x = time, y = val))
g <- g + facet_grid(var ~.)
print(g)
df <- data.frame("time" = c(1,2,3,4,5),
"case1" = c(NA,2,3,4,NA),
"case2" = c(5,4,3,2,NA),
"case3" = c(4,NA,NA,NA,2))
df %<>%
mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>%
gather(var, val, -time) %>%
mutate(type = ifelse(grepl("interp", var), "interp", "raw"),
var = gsub("_.*", "", var))
g <- ggplot()
g <- g + geom_point(data = df %>% filter(type == "raw"), aes(x = time, y = val, colour = var))
g <- g + geom_line(data = df %>% filter(type == "interp"), aes(x = time, y = val, colour = var))
print(g)
答案 2 :(得分:1)
尽管您在粘贴要评估的参数时有一些错误,但您走在正确的道路上,但在我的头上是那些:
paste0()
删除空格i
作为数字以下是我上面提到的更改的代码:
cols_to_interpolate <- setdiff(colnames(df), 'time')
for (col in cols_to_interpolate){
#print(col)
argument <- paste0("df$", col,"_i <- with(df, interp1(time, ", col,", time , 'linear'))")
#print(argument)
eval(parse(text=argument))
}
p <- ggplot (df, aes(x = time))
for (col in cols_to_interpolate){
p <- p +
geom_point(aes_string(y = col, color = shQuote(col)), na.rm = TRUE) +
geom_line(aes_string(y = paste0(col,"_i"), color = shQuote(col)), na.rm = TRUE)
}
p + ylab('Y Label') + xlab('X Label')
注意:我选择此方法是因为它与您尝试执行的操作最接近,但是我敢肯定,有很多更有效的方法可以得到最终结果。 (当然,减少循环是一个加号)