我正在尝试将原始数据转换为Cox回归的起止格式。我的原始数据集是这样的:
df = data.frame(initial = c(25, 25, 20, 21, 21, 17),
total = c(4.25, 28, 0.5, 38, 14, 43),
age = c(30, 53, 20, 59, 35, 60),
ethanol = c(0.04, 0.306, 0.201, 0.222, 0.047, 0.085),
status = c(0, 0, 0, 0, 0, 1))
例如,对于第一次观察,原始数据格式如下:
initial total age ethanol status
1 25 4.25 30 0.04 0
期望的数据格式如下:
id start stop ethanol status
1 0.00 25.00 0.00 0
1 25.00 29.25 0.04 0
1 29.25 30 0 0
所以我写下面的代码
edf = data.frame(id = integer(),
start = numeric(),
stop = numeric(),
ethanol = numeric(),
status = integer())
j = 1
for( i in 1:4){
if( (df[i, 1] + df[i,2]) >= df[i,3] ){
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = df[i,"status"]
} else{
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"] + df[i,"total"]
edf[j,3] = df[i,"age"]
edf[j,4] = 0
edf[j,5] = df[i,"status"]
}
}
但是我得到的数据帧是(例如,第一个观察结果):
id start stop ethanol status
1 0.00 25.00 0.00 0
1 25.00 29.25 0.04 0
缺少一行:
id start stop ethanol status
1 29.25 30 0 0
似乎未执行else语句的最后一部分:
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"] + df[i,"total"]
edf[j,3] = df[i,"age"]
edf[j,4] = 0
edf[j,5] = df[i,"status"]
我不知道怎么了,有什么建议吗? 我在MacOS(x86_64-apple-darwin15.6.0。)上使用R版本3.4.4。 预先感谢!
答案 0 :(得分:0)
在循环的每次迭代中写入第一行之前,您不会递增行号j
。因此,您每次都在上一行进行写入。以下将起作用。
j = 0
for( i in 1:4){
j = j + 1
if( (df[i, 1] + df[i,2]) >= df[i,3] ){
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = df[i,"status"]
} else{
edf[j,1] = i
edf[j,2] = 0
edf[j,3] = df[i,"initial"]
edf[j,4] = 0
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"]
edf[j,3] = df[i,"initial"] + df[i,"total"]
edf[j,4] = df[i,"ethanol"]
edf[j,5] = 0
j = j+1
edf[j,1] = i
edf[j,2] = df[i,"initial"] + df[i,"total"]
edf[j,3] = df[i,"age"]
edf[j,4] = 0
edf[j,5] = df[i,"status"]
}
}
编辑:有更好的方法可以做到这一点。可能有些包装可以更轻松地重塑数据。或者,您可以将三个开始/停止步骤创建为单独的数据框,然后合并它们。如果失败,您至少可以这样简化:
df$end = df$initial + df$total
for (i in rownames(df)) {
r = df[i,]
edf[nrow(edf) + 1,] = list(i, 0, r$initial, 0, 0)
if (r$end >= r$age){
edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, r$status)
}
else {
edf[nrow(edf) + 1,] = list(i, r$initial, r$end, r$ethanol, 0)
edf[nrow(edf) + 1,] = list(i, r$end, r$age, 0, r$status)
}
}