I have two dataframes I'd like to plot against each other:
> df1 <- data.frame(HV = c(3,3,3), NAtlantic850t = c(0.501, 1.373, 1.88), AO = c(-0.0512, 0.2892, 0.0664))
> df2 <- data.frame(HV = c(3,3,2), NAtlantic850t = c(1.2384, 1.3637, -0.0332), AO = c(-0.5915, -0.0596, -0.8842))
They're identical, I'd like to plot them column vs column (e.g. df1$HV, df2$HV) - loop through the dataframe columns and plot them against each other in a scatter graph.
I've looked through 20+ questions asking similar things and can't figure it out - would appreciate some help on where to start. Can I use lapply and plot or ggplot when they're two DFs? Should I merge them first?
答案 0 :(得分:1)
正如您所建议的那样,在调用plot命令之前,我确实会首先重新排列到可绘制数据框列表中。如果您想将data
参数提供给ggplot
,我认为这将是最佳选择。类似的东西:
plot_dfs <- lapply(names(df1),function(nm)data.frame(col1 = df1[,nm], col2 = df2[,nm]))
for (df in plot_dfs)plot(x = df[,"col1"], y = df[,"col2"])
或使用ggplot:
for (df in plot_dfs){
print(
ggplot(data = df, aes(x=col1, y=col2)) +
geom_point())}
如果您想将列名添加为情节标题,则可以执行以下操作:
for (idx in seq_along(plot_dfs)){
print(
ggplot(data = plot_dfs[[idx]], aes(x=col1, y=col2)) +
ggtitle(names(df1)[idx]) +
geom_point())}
答案 1 :(得分:1)
您可以像这样遍历列:
for(col in 1:ncol(df1)){
plot(df1[,col], df2[,col])
}
在运行此列之前,请确保两个数据框具有相同的列数(并且列的顺序相同)。
答案 2 :(得分:0)
这是一种方法 - 循环遍历列索引并逐个创建绘图,将它们添加到列表中并将每个列表写入文件:
library(ggplot2)
# create some data to plot
df1 <- iris[, sapply(iris, is.numeric)]
df2 <- iris[sample(1:nrow(iris)), sapply(iris, is.numeric)]
# a list to catch each plot object
plot_list <- vector(mode="list", length=ncol(df1))
for (idx in seq_along(df1)){
plot_list[[idx]] <- ggplot2::qplot(df1[[idx]], df2[[idx]]) +
labs(title=names(df1)[idx])
ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=plot_list[[idx]])
}
正如您在问题中所建议的那样,您也可以将s/lapply()
与匿名函数一起使用,例如像这样(虽然这里我们没有存储图,只是将每个图写入磁盘):
lapply(seq_along(df1), function(idx){
the_plot <- ggplot2::qplot(df1[[id]], df2[[idx]]) + labs(title=names(df1)[idx])
ggsave(filename=paste0(names(df1)[idx], ".pdf"), plot=the_plot)
})
如果您想保留图表列表(如for
- 循环示例中所示),只需将lapply()
分配给变量(例如plot_list
)并添加类似{的行在关闭函数之前{1}}。
根据您的目标,您可以通过多种方式修改/调整此方法。
希望这有帮助~~
ps 如果列可能不是相同的顺序,最好循环列名称而不是列索引(即使用{{1而不是return(the_plot)
)。您可以对名称和索引使用相同的for (colname in names(df1)){...
子集化语法。