Loop through and plot columns of two identical dataframes

时间:2018-05-28 18:57:00

标签: r plot ggplot2 apply

I have two dataframes I'd like to plot against each other:

> df1 <- data.frame(HV = c(3,3,3), NAtlantic850t = c(0.501, 1.373, 1.88), AO = c(-0.0512, 0.2892, 0.0664))

> df2 <- data.frame(HV = c(3,3,2), NAtlantic850t = c(1.2384, 1.3637, -0.0332), AO = c(-0.5915, -0.0596, -0.8842))

They're identical, I'd like to plot them column vs column (e.g. df1$HV, df2$HV) - loop through the dataframe columns and plot them against each other in a scatter graph.

I've looked through 20+ questions asking similar things and can't figure it out - would appreciate some help on where to start. Can I use lapply and plot or ggplot when they're two DFs? Should I merge them first?

3 个答案:

答案 0 :(得分:1)

正如您所建议的那样,在调用plot命令之前,我确实会首先重新排列到可绘制数据框列表中。如果您想将data参数提供给ggplot,我认为这将是最佳选择。类似的东西:

plot_dfs <- lapply(names(df1),function(nm)data.frame(col1 = df1[,nm], col2 = df2[,nm]))
for (df in plot_dfs)plot(x = df[,"col1"], y = df[,"col2"])

或使用ggplot:

for (df in plot_dfs){
  print(
  ggplot(data = df, aes(x=col1, y=col2)) +
  geom_point())}

如果您想将列名添加为情节标题,则可以执行以下操作:

for (idx in seq_along(plot_dfs)){
  print(
    ggplot(data = plot_dfs[[idx]], aes(x=col1, y=col2)) +
      ggtitle(names(df1)[idx]) +
      geom_point())}

答案 1 :(得分:1)

您可以像这样遍历列:

for(col in 1:ncol(df1)){
  plot(df1[,col], df2[,col])
}

在运行此列之前,请确保两个数据框具有相同的列数(并且列的顺序相同)。

答案 2 :(得分:0)

这是一种方法 - 循环遍历列索引并逐个创建绘图,将它们添加到列表中并将每个列表写入文件:

library(ggplot2)

# create some data to plot 
df1 <- iris[, sapply(iris, is.numeric)]
df2 <- iris[sample(1:nrow(iris)), sapply(iris, is.numeric)]

# a list to catch each plot object 
plot_list <- vector(mode="list", length=ncol(df1))

for (idx in seq_along(df1)){

  plot_list[[idx]] <- ggplot2::qplot(df1[[idx]], df2[[idx]]) + 
    labs(title=names(df1)[idx])

  ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=plot_list[[idx]])
}

正如您在问题中所建议的那样,您也可以将s/lapply()与匿名函数一起使用,例如像这样(虽然这里我们没有存储图,只是将每个图写入磁盘):

lapply(seq_along(df1), function(idx){
  the_plot <- ggplot2::qplot(df1[[id]], df2[[idx]]) + labs(title=names(df1)[idx])
  ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=the_plot)
})

如果您想保留图表列表(如for - 循环示例中所示),只需将lapply()分配给变量(例如plot_list)并添加类似{的行在关闭函数之前{1}}。

根据您的目标,您可以通过多种方式修改/调整此方法。

希望这有帮助~~

ps 如果列可能不是相同的顺序,最好循环列名称而不是列索引(即使用{{1而不是return(the_plot))。您可以对名称和索引使用相同的for (colname in names(df1)){...子集化语法。