我正在分析一个数据集,并希望量化整个数据集的线性回归。数据集具有不同的因变量,但是具有单个自变量。
我尝试通过ggplot2包中的stat_smooth()使用简单的线性回归。这给出了数据集的单个回归线,但是我要做的是将这些回归线组合成一个单一的回归线,既可以代表这些回归线,又可以代表更多回归线的平均值
geom_point(aes(x= DateAndTime, y= T_423), na.rm=TRUE, color="purple", shape=19, size=3)+
geom_point(aes(x= DateAndTime, y= T_422), na.rm=TRUE, color="red", shape=8, size=1)+
ggtitle("Module Temperature STP423 - Total distribution") +
xlab("Date") + ylab("Module Temperature (C)")
数据看起来像这样:
Dates X1 X2
1 2014-01-04 8.0645816 7.2969667
2 2014-01-06 7.7804850 7.1507470
3 2014-01-07 8.8772607 8.6917391
4 2014-01-08 8.8943146 8.3475009
5 2014-01-10 11.6734008 10.6493480
6 2014-01-11 9.0915727 8.5793932
7 2014-01-12 9.5216658 9.4891858
8 2014-01-13 -6.2493962 -6.9360515
答案 0 :(得分:0)
ggplot2
最适合长格式数据。例如,geom_smooth
要求所有y值都在同一列中。因此,我们需要将您的数据转换为长格式。我特别重复使用了代码from this FAQ(因为它与ggplot有关),但是另一个常见问题解答How to reshape data from wide to long?涵盖了其他几种方法。
test_data = read.table(text = ' Dates X1 X2
1 2014-01-04 8.0645816 7.2969667
2 2014-01-06 7.7804850 7.1507470
3 2014-01-07 8.8772607 8.6917391
4 2014-01-08 8.8943146 8.3475009
5 2014-01-10 11.6734008 10.6493480
6 2014-01-11 9.0915727 8.5793932
7 2014-01-12 9.5216658 9.4891858
8 2014-01-13 -6.2493962 -6.9360515', header = T)
test_data$Dates = as.Date(test_data$Dates)
# code copy/pasted from linked FAQ, only changed id = "date" to id = "Dates"
library("reshape2")
library("ggplot2")
test_data_long <- melt(test_data, id = "Dates") # convert to long format
# now we can plot:
ggplot(test_data_long, aes(x = Dates, y = value)) +
geom_point(aes(color = variable, size = variable, shape = variable)) +
geom_smooth() +
labs(title = "Module Temperature STP423 - Total distribution",
x = "Date",
y = "Module Temperature (C)") +
scale_size_manual(values = c(1, 3)) +
scale_color_manual(values = c("red", "purple")) +
scale_shape_manual(values = c(8, 19))