在运行线性回归之前处理数据

时间:2014-04-24 20:39:59

标签: r regression lm

我的数据看起来像这样:

example <- structure(list(ID = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), .Label = c("A1", "A2", "A3"), class = "factor"), y = c(44.1160205053166, 
33.0574407376116, 50.5295183433918, 44.1160205053166, 33.0574407376116, 
50.5295183433918, 44.1160205053166, 33.0574407376116, 50.5295183433918
), day = structure(c(1392647220, 1392733620, 1392820020, 1392647220, 
1392733620, 1392820020, 1392647220, 1392733620, 1392820020), class = c("POSIXct", 
"POSIXt"), tzone = ""), P = c(16.345885329647, 6.21615618292708, 
9.89848991157487, 14.4955473870505, 8.47820783441421, 2.36668747442309, 
10.4325918923132, 9.26802998466883, 14.8380589560838), o = c(25.6364896567538, 
10.5067015672103, 12.0306829502806, 25.6364896567538, 10.5067015672103, 
12.0306829502806, 25.6364896567538, 10.5067015672103, 12.0306829502806
)), .Names = c("ID", "y", "day", "P", "x"), row.names = c(NA, 
-9L), class = "data.frame")

我想在第1天,第2天和第3天对P进行回归。这是

y ~ p[1] + p[2] + p[3] + x

这样做的最佳方式是什么?在运行lm之前,是否需要使用这些变量创建新数据框?还是有更好的方法?

谢谢!

1 个答案:

答案 0 :(得分:0)

lm函数

中使用substet参数

lm(Y ~ P, data=df, subset=df$P %in% 1:3)