使用新数据框进行编辑

Question

我想要一种有效的方法，通过以下原理将具有缺失值的数据帧绘制为R中的线图；

第一个和最后一个值中的NA完全省略（没有线/点）
将实际值内的NA替换为折线图的中间值（不出现点）

这是我的数据框的一个示例（已编辑）

df <- data.frame("time" = c(1,2,3,4,5),
             "case1" = c(NA,2,3,4,NA),
             "case2" = c(5,4,3,2,NA),
             "case3" = c(4,NA,NA,NA,2))

这是仅在第一种情况下的工作方式

library(pracma)
df$case1.i <- with(df, interp1(time, case1, time, 'linear'))
library(ggplot2)
ggplot(df, aes(time)) + geom_point(aes(case1 = case1)) + geom_line(aes(case1 = case1.i))

我正在尝试制定一些措施，使其适用于我实际数据框中的大约200列。到目前为止，这段代码似乎无效

for (i in colnames(df)){
  argument <- paste("df$case",i,".i <- with(df, interp1(time, case",i,", time, 'linear'))")
  eval(parse(text=argument))
}

Answer 1

将数据读取到新的Zoo对象z中，对其应用na.approx，以填充数据主体中的NA值，然后使用ggplot2进行绘制。如果需要单独的面板，请省略facet = NULL。请注意，fortify.zoo和melt = TRUE会将数据转换为带有Index，Series和Value列的长格式，并在geom_point中使用。如果只需要行，请省略geom_point(...)部分。参见此答案末尾的图片。这里显示的方法相对紧凑，避免粘贴在一起然后评估代码。

library(ggplot2)
library(zoo)

z <- read.zoo(df)
autoplot(na.approx(z), facet = NULL) + 
  geom_point(aes(Index, Value, group = Series), fortify(z, melt = TRUE))

或者如果您想为每一列单独绘制图，请尝试以下方法：

pdf("civy.pdf")

for(i in 1:ncol(z)) {
  p <- autoplot(na.approx(z[, i])) + 
    ylab(names(z)[i]) +
    geom_point(aes(Index, Value), fortify(z[, i], melt = TRUE))
  plot(p)
}

dev.off()

Answer 2

这里有两种解决方案：一种是将所有数据绘制在一起，按颜色区分；另一种情况是，它们通过案例在不同的方面分别绘制。原理基本相同：我使用approx进行线性插值，将数据从宽到长重新格式化以便于在ggplot2中进行绘制，然后进行绘制。在第二个解决方案中，我还创建了一个名为type的新变量，以区分插值数据和原始数据。

绘制在一起

# Create data frame
df <- data.frame("time" = c(1,2,3,4,5),
                 "case1" = c(NA,2,3,4,NA),
                 "case2" = c(1,2,3,4,NA),
                 "case3" = c(1,NA,NA,NA,5)) 

# Perform interpolation on all columns
# Switch from wide to long format
df %<>% 
  mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>% 
  gather(var, val, -time)

# Plot results all in one figure
g <- ggplot() 
g <- g + geom_point(data = df %>% filter(!grepl("interp", var)), aes(x = time, y = val, colour = var))
g <- g + geom_line(data = df %>% filter(grepl("interp", var)), aes(x = time, y = val, colour = var))
print(g)

分别绘制

# Create data frame
df <- data.frame("time" = c(1,2,3,4,5),
                 "case1" = c(NA,2,3,4,NA),
                 "case2" = c(1,2,3,4,NA),
                 "case3" = c(1,NA,NA,NA,5)) 

# Perform interpolation on all columns
# Switch from wide to long format
# Create column to indicate whether raw or interpolated
# Strip "_interp" from var
df %<>% 
  mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>% 
  gather(var, val, -time) %>% 
  mutate(type = ifelse(grepl("interp", var), "interp", "raw"),
         var = gsub("_.*", "", var))

# Plot results all separate figures
g <- ggplot() 
g <- g + geom_point(data = df %>% filter(type == "raw"), aes(x = time, y = val))
g <- g + geom_line(data = df %>% filter(type == "interp"), aes(x = time, y = val))
g <- g + facet_grid(var ~.)
print(g)

使用新数据框进行编辑

df <- data.frame("time" = c(1,2,3,4,5),
                 "case1" = c(NA,2,3,4,NA),
                 "case2" = c(5,4,3,2,NA),
                 "case3" = c(4,NA,NA,NA,2))

df %<>% 
  mutate_at(vars(contains("case")), funs(interp = approx(time, ., xout = time)$y)) %>% 
  gather(var, val, -time) %>% 
  mutate(type = ifelse(grepl("interp", var), "interp", "raw"),
         var = gsub("_.*", "", var))

g <- ggplot() 
g <- g + geom_point(data = df %>% filter(type == "raw"), aes(x = time, y = val, colour = var))
g <- g + geom_line(data = df %>% filter(type == "interp"), aes(x = time, y = val, colour = var))
print(g)

Answer 3

尽管您在粘贴要评估的参数时有一些错误，但您走在正确的道路上，但在我的头上是那些：

您应使用paste0()删除空格
您正在遍历列名，但使用i作为数字
我会遍历只想插值所有列的列

以下是我上面提到的更改的代码：

cols_to_interpolate <- setdiff(colnames(df), 'time')

for (col in cols_to_interpolate){
  #print(col)
  argument <- paste0("df$", col,"_i <- with(df, interp1(time, ", col,", time , 'linear'))")
  #print(argument)
  eval(parse(text=argument))
}

p <- ggplot (df, aes(x = time))
for (col in cols_to_interpolate){
    p <- p + 
      geom_point(aes_string(y = col, color = shQuote(col)),  na.rm = TRUE) + 
      geom_line(aes_string(y = paste0(col,"_i"), color = shQuote(col)), na.rm = TRUE)
  }
p + ylab('Y Label') + xlab('X Label')

注意：我选择此方法是因为它与您尝试执行的操作最接近，但是我敢肯定，有很多更有效的方法可以得到最终结果。（当然，减少循环是一个加号）

快速插值R图中的缺失值

3 个答案:

绘制在一起

分别绘制

使用新数据框进行编辑