R:如何修改数据集进行生存分析?

时间:2019-06-25 06:22:07

标签: r dataset survival-analysis

我正在尝试使用R进行生存分析。我有看起来像image的数据集。

我想分析两种类型之间的生存差异。 S6S12S18 ...是表示患者存活或死亡的后续月份(6个月,12个月,18个月等)。

由于实际数据集中包含大量患者,因此无法手动计算生存时间。

希望得到您的支持。

1 个答案:

答案 0 :(得分:1)

我希望以下对您有用。让我知道是否有问题-我已在每个步骤中添加了一条评论,以便您了解我的所作所为。使用tidyverse进行数据操作还有更多优雅的解决方案,但是由于我现在时间有限,因此我很快就做到了。如有可能,请将来提供数据集,以使其更容易找出具体问题。

# Import the dataset here - I'm generating mine from scratch but import yours using read.csv or similar

df <- data.frame(pt = 1:6, type = rep(0:1, 3), s0 = c("S", "S", "S", "", "S", "S"), 
              s6 = c("S", "", "S", "", "S", "S"), s12 = c("S", "", "D", "", "D", ""))

# Data manipulation starts here
library(tidyr)
# Replace s12 below with the largest followup event you have
long_df <- tidyr::gather(df, followupRaw, status, s0:s12) # Converts to long form which I prefer to work with - ignore warning message if related to attributes not identical
long_df$status <- factor(long_df$status, levels = c("S", "D")) # Converts to factor and coerces blank to NA values (important)
long_df$followup <- as.numeric(substring(long_df$followupRaw, 2)) # Removes the s character to convert each followup to an integer - make sure all followup periods are in the sXX format

# Form final dataframe for survival analysis
survivalDf <- data.frame(patient = unique(df$pt)) # Single row per patient
survivalDf$followupAvailable <- sapply(survivalDf$patient, function(x) (sum(!is.na(long_df[long_df$pt == x, "status"]))) > 0) # Logical vector so that we can idenify what rows have a followup event recorded (T) versus missing (F)
survivalDf <- survivalDf[survivalDf$followupAvailable,] # Remove patients with no followup
survivalDf$type <- sapply(survivalDf$patient, function(x) df[df$pt == x, "type"]) # Add type to final dataframe
survivalDf$lastFollowup <- sapply(survivalDf$patient, function(x) max(long_df[long_df$pt == x & !is.na(long_df$status), "followup"])) # Find last instance of followup
survivalDf$status <- mapply(function(x, y) long_df[long_df$pt == x & long_df$followup == y, "status"], survivalDf$patient, survivalDf$lastFollowup) # Add patient status at last followup
survivalDf$statusKM <- ifelse(survivalDf$status == "S", 0, ifelse(survivalDf$status == "D", 1, survivalDf$status)) # Convert to Kaplan Meier friendly format: 0-censored, 1-death

# Survival analysis here
library(survival)
library(survminer)
km <- survfit(Surv(lastFollowup, statusKM) ~ type, data = survivalDf)
summary(km) # Table of data
ggsurvplot(km, data = survivalDf) # Kaplan Meier of result