如何在单个图中绘制不同数据集的平均线性回归

时间:2019-05-24 10:00:31

标签: r ggplot2 regression

我正在分析一个数据集,并希望量化整个数据集的线性回归。数据集具有不同的因变量,但是具有单个自变量。

我尝试通过ggplot2包中的stat_smooth()使用简单的线性回归。这给出了数据集的单个回归线,但是我要做的是将这些回归线组合成一个单一的回归线,既可以代表这些回归线,又可以代表更多回归线的平均值

The image is here

    geom_point(aes(x= DateAndTime, y= T_423), na.rm=TRUE, color="purple", shape=19, size=3)+
    geom_point(aes(x= DateAndTime, y= T_422), na.rm=TRUE, color="red", shape=8, size=1)+
    ggtitle("Module Temperature STP423 - Total distribution") +
           xlab("Date") + ylab("Module Temperature (C)")


数据看起来像这样:

        Dates            X1            X2
1    2014-01-04      8.0645816      7.2969667
2    2014-01-06      7.7804850      7.1507470
3    2014-01-07      8.8772607      8.6917391
4    2014-01-08      8.8943146      8.3475009
5    2014-01-10      11.6734008     10.6493480
6    2014-01-11      9.0915727      8.5793932
7    2014-01-12      9.5216658      9.4891858
8    2014-01-13     -6.2493962     -6.9360515

1 个答案:

答案 0 :(得分:0)

ggplot2最适合长格式数据。例如,geom_smooth要求所有y值都在同一列中。因此,我们需要将您的数据转换为长格式。我特别重复使用了代码from this FAQ(因为它与ggplot有关),但是另一个常见问题解答How to reshape data from wide to long?涵盖了其他几种方法。

test_data = read.table(text = '        Dates            X1            X2
1    2014-01-04      8.0645816      7.2969667
2    2014-01-06      7.7804850      7.1507470
3    2014-01-07      8.8772607      8.6917391
4    2014-01-08      8.8943146      8.3475009
5    2014-01-10      11.6734008     10.6493480
6    2014-01-11      9.0915727      8.5793932
7    2014-01-12      9.5216658      9.4891858
8    2014-01-13     -6.2493962     -6.9360515', header = T)

test_data$Dates = as.Date(test_data$Dates)

# code copy/pasted from linked FAQ, only changed id = "date" to id = "Dates"
library("reshape2")
library("ggplot2")

test_data_long <- melt(test_data, id = "Dates")  # convert to long format


# now we can plot:

ggplot(test_data_long, aes(x = Dates, y = value)) +
  geom_point(aes(color = variable, size = variable, shape = variable)) +
  geom_smooth() +
  labs(title = "Module Temperature STP423 - Total distribution",
       x = "Date",
       y = "Module Temperature (C)") +
  scale_size_manual(values = c(1, 3)) +
  scale_color_manual(values = c("red", "purple")) +
  scale_shape_manual(values = c(8, 19))

enter image description here