按名称对dplyr进行线性回归的模型

时间:2019-06-19 17:44:38

标签: r dplyr linear-regression

我有以下数据框,每行包含四个日期(“ y”)和四个度量(“ x”):

df = structure(list(x1 = c(69.772808673525, NA, 53.13125414839, 
17.3033274666411, 
NA, 38.6120670385487, 57.7229000792707, 40.7654208618078, 38.9010405201831, 
65.7108936694177), y1 = c(0.765671296296296, NA, 1.37539351851852, 
0.550277777777778, NA, 0.83037037037037, 0.0254398148148148, 
0.380671296296296, 1.368125, 2.5250462962963), x2 = c(81.3285388496182, 
NA, NA, 44.369872853302, NA, 61.0746827226573, 66.3965114460601, 
41.4256874481852, 49.5461413070349, 47.0936997726146), y2 = 
c(6.58287037037037, 
NA, NA, 9.09377314814815, NA, 7.00127314814815, 6.46597222222222, 
6.2462962962963, 6.76976851851852, 8.12449074074074), x3 = c(NA, 
60.4976916064608, NA, 45.3575294731303, 45.159758146854, 71.8459173097114, 
NA, 37.9485456227131, 44.6307631013742, 52.4523342186143), y3 = c(NA, 
12.0026157407407, NA, 13.5601157407407, 16.1213657407407, 15.6431018518519, 
NA, 15.8986805555556, 13.1395138888889, 17.9432638888889), x4 = c(NA, 
NA, NA, 57.3383407228293, NA, 59.3921356160536, 67.4231673171527, 
31.853845252547, NA, NA), y4 = c(NA, NA, NA, 18.258125, NA, 
19.6074768518519, 
20.9696527777778, 23.7176851851852, NA, NA)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

我想为每一行创建一个附加列,其中包含所有y与所有x的斜率(每行都是这4个测量值的患者)。

这是我到目前为止所拥有的:

df <- df %>% mutate(Slope = lm(vars(starts_with("y") ~ 
vars(starts_with("x"), data = .)

我遇到错误:

invalid type (list) for variable 'vars(starts_with("y"))'...

我在做什么错,如何计算行斜率?

1 个答案:

答案 0 :(得分:0)

您使用的是tidyverse语法,但您的数据并不整齐...

也许您应该重新排列data.frame并重新考虑存储数据的方式。 这是一种快速而肮脏的方法(至少在我正确理解您的解释的情况下):

df <- merge(reshape(df[,(1:4)*2-1], dir="long", varying = list(1:4), v.names = "x", idvar = "patient"),
            reshape(df[,(1:4)*2], dir="long", varying = list(1:4), v.names = "y", idvar = "patient"))
df$patient <- factor(df$patient)

然后,您可以遍历患者,执行线性回归并将斜率作为向量:

sapply(levels(df$patient), function(pat) {
  coef(lm(y~x,df[df$patient==pat,],na.action = "na.omit"))[2]
})