R:if语句未执行的代码的一部分

时间:2018-09-23 03:39:24

标签: r for-loop if-statement

我正在尝试将原始数据转换为Cox回归的起止格式。我的原始数据集是这样的:

df = data.frame(initial = c(25, 25, 20, 21, 21, 17), 
                total = c(4.25, 28, 0.5, 38, 14, 43), 
                age = c(30, 53, 20, 59, 35, 60), 
                ethanol = c(0.04, 0.306, 0.201, 0.222, 0.047, 0.085), 
                status = c(0, 0, 0, 0, 0, 1))

例如,对于第一次观察,原始数据格式如下:

    initial  total  age  ethanol  status
 1  25       4.25   30    0.04    0

期望的数据格式如下:

 id  start  stop     ethanol  status
 1   0.00   25.00    0.00     0
 1   25.00  29.25    0.04     0
 1   29.25  30       0        0

所以我写下面的代码

edf = data.frame(id = integer(), 
                 start = numeric(), 
                 stop = numeric(), 
                 ethanol = numeric(),
                 status = integer())

j = 1

for( i in 1:4){

  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

但是我得到的数据帧是(例如,第一个观察结果):

 id     start    stop    ethanol  status
 1      0.00     25.00   0.00     0
 1      25.00    29.25   0.04     0

缺少一行:

id  start    stop    ethanol  status
1   29.25    30      0        0

似乎未执行else语句的最后一部分:

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]

我不知道怎么了,有什么建议吗? 我在MacOS(x86_64-apple-darwin15.6.0。)上使用R版本3.4.4。 预先感谢!

1 个答案:

答案 0 :(得分:0)

在循环的每次迭代中写入第一行之前,您不会递增行号j。因此,您每次都在上一行进行写入。以下将起作用。

j = 0

for( i in 1:4){
  j = j + 1
  if( (df[i, 1] + df[i,2]) >= df[i,3] ){
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0
    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = df[i,"status"]
  } else{
    edf[j,1] = i
    edf[j,2] = 0
    edf[j,3] = df[i,"initial"]
    edf[j,4] = 0
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"]
    edf[j,3] = df[i,"initial"] + df[i,"total"]
    edf[j,4] = df[i,"ethanol"]
    edf[j,5] = 0

    j = j+1
    edf[j,1] = i
    edf[j,2] = df[i,"initial"] + df[i,"total"]
    edf[j,3] = df[i,"age"]
    edf[j,4] = 0
    edf[j,5] = df[i,"status"]
  }
}

编辑:有更好的方法可以做到这一点。可能有些包装可以更轻松地重塑数据。或者,您可以将三个开始/停止步骤创建为单独的数据框,然后合并它们。如果失败,您至少可以这样简化:

df$end = df$initial + df$total
for (i in rownames(df)) {
    r = df[i,]
    edf[nrow(edf) + 1,] = list(i, 0, r$initial, 0, 0)
    if (r$end >= r$age){
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, r$status)
    }
    else {
      edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, 0)
      edf[nrow(edf) + 1,] = list(i, r$end, r$age, 0, r$status)
    }
}