Question

我有以下data set的大format条：第一列是type，subsequent columns是'type'发生的不同时间。我想计算row (~7000 rows)的每个subset T0-T2的斜率，然后计算t0-t2并输出该信息，然后得到每行的斜率平均值。例如，获得类型1的子集T0-T2和t0-t2的斜率，然后获得行类型1的两个值的平均值。有些行完全缺少数据，而有些行缺少一个或两个值。

Type    T0   T1   T2   t0   t1   t2  
type1  0.2  0.3  0.4  0.3  0.2  0.1 
type2  1.4  2.5  3.4  1.5  0.5  3.4
type3  0.4  8.1  8.1       2.2
type4        
...

我是R的初学者，所以尝试这样做一直很有挑战性，就像在我的脑海里一样，这似乎很简单。我收到了缺失值（NA）的错误，我将非常感谢您对本网站上的类似问题的任何想法或指示。感谢

Answer 1

首先，您可能想要编写一个可以计算三个连续值的斜率的函数，如下所示：

slope  <-  function(x){
    if(all(is.na(x)))
        # if x is all missing, then lm will throw an error that we want to avoid
        return(NA)
    else
        return(coef(lm(I(1:3)~x))[2])
}

然后你可以使用apply()函数计算每一行的斜率（MARGIN = 1），如下所示：

df <- read.csv(text = 
"Type,T0,T1,T2,t0,t1,t2
type1,0.2,0.3,0.4,0.3,0.2,0.1 
type2,1.4,2.5,3.4,1.5,0.5,3.4
type3,0.4,8.1,8.1,,2.2,")


df$slope1  <-  
    apply(df[,c('T0','T1','T2')],
          1,
          slope)

df$slope2  <-  
    apply(df[,c('t0','t1','t2')],
          1,
          slope)

然后计算平均斜率：

df$average.slope  <-  (df$slope1 + df$slope2)/2

Answer 2

您可以获得每行的斜坡，例如：

#dat <- read.table(text="Type    T0   T1   T2   t0   t1   t2  
#type1  0.2  0.3  0.4  0.3  0.2  0.1 
#type2  1.4  2.5  3.4  1.5  0.5  3.4
#type3  0.4  8.1  8.1   NA  2.2   NA",header=TRUE)

tapply(
  dat[c("T0","T1","T2")],
  dat["Type"],
  FUN=function(x) 
    coef(lm(unlist(x) ~ seq_along(x)))[-1]
)

#Type
#type1 type2 type3 
# 0.10  1.00  3.85

使用R计算大数据集中每行的斜率

2 个答案: