我有一个以这种方式构建的数据集:
Patient ID Visit Date Dead Death Date Sex State
101 Feb/14 1 Jan/15 M 2
101 June/14 1 Jan/15 M 3
101 December/14 1 Jan/15 M 2
102 Jan/14 0 N/A M 1
102 April/14 0 N/A M 1
如果患者已经死亡,所有访问将被标记为“死亡”代码和死亡日期。
如果死亡代码是= 1
我需要创建一行作为患者101的最后一次访问
死亡日期在“访问日期栏”
和“State”变量指示5(我的数据集中的死亡状态代码)。
我想要的数据集看起来像这样(第四行数据是重要数据集):
Patient ID Visit Date Dead Death Date Sex State
101 Feb/14 1 Jan/15 M 2
101 June/14 1 Jan/15 M 3
101 December/14 1 Jan/15 M 2
101 Jan/15 1 Jan/15 M 5
102 Jan/14 0 N/A M 1
102 April/14 0 N/A M 1
答案 0 :(得分:1)
您可以执行以下操作:
df <- read.table(header=T, text='Patient_ID Visit_Date Dead Death_Date Sex State
101 Feb/14 1 Jan/15 M 2
101 June/14 1 Jan/15 M 3
101 December/14 1 Jan/15 M 2
102 Jan/14 0 N/A M 1
102 April/14 0 N/A M 1 ', stringsAsFactors=F)
df$Patient_ID <- as.numeric(df$Patient_ID) #this needs to be numeric
df <- rbind(df, list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )) #use rbind to add a row
> df[order(df$Patient_ID),] #sort on Patient ID and the last row is inserted where it should
Patient_ID Visit_Date Dead Death_Date Sex State
1 101 Feb/14 1 Jan/15 M 2
2 101 June/14 1 Jan/15 M 3
3 101 December/14 1 Jan/15 M 2
6 101 Jan/15 1 Jan/15 M 5
4 102 Jan/14 0 N/A M 1
5 102 April/14 0 N/A M 1
因此,您真正需要使用的唯一事情是使用rbind
函数,该函数在data.frame的末尾添加一行。将其用作rbind( <your_data.frame> , <a vector with the values to add>).
在我们的案例中,<your data frame>
为df,<a vector with the values to add>
为list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )
。
最好使用列表向量添加行,因为这样可以确保data.frame的列类型保持不变。使用原子向量会将所有内容强制转换为角色。
答案 1 :(得分:0)
数据表回答:
df <- read.table(header=T, text='Patient_ID Visit_Date Dead Death_Date Sex State
101 Feb/14 1 Jan/15 M 2
101 June/14 1 Jan/15 M 3
101 December/14 1 Jan/15 M 2
102 Jan/14 0 N/A M 1
102 April/14 0 N/A M 1 ', stringsAsFactors=F)
library(data.table)
DT <- as.data.table(df)
# take only the Patient_ID, Death indicator, Death date and sex
dead <- unique(DT[ Death_Date != "N/A", c(1, 3, 4, 5), with = FALSE, ])
# move the death date to visited, assign '5' to state
dead[, c("Visit_Date", "State") := list(Death_Date, 5) ]
# recombine with original records
records <- rbind(DT, dead)
records[ order(records$Patient_ID, as.Date(records$Visit_Date, format = "%b/%d")),]
Patient_ID Visit_Date Dead Death_Date Sex State
1: 101 Jan/15 1 Jan/15 M 5
2: 101 Feb/14 1 Jan/15 M 2
3: 101 June/14 1 Jan/15 M 3
4: 101 December/14 1 Jan/15 M 2
5: 102 Jan/14 0 N/A M 1
6: 102 April/14 0 N/A M 1
答案 2 :(得分:0)
有几件事正在发生。首先,您应该使用NA
而不是字符串。其次,您应该格式化这些日期,以便您可以使用它们(并按正确排序)。
dat <- read.table(header = TRUE, text = "ID Visit Dead Death Sex State
101 Feb/14 1 Jan/15 M 2
101 June/14 1 Jan/15 M 3
101 December/14 1 Jan/15 M 2
102 Jan/14 0 N/A M 1
102 April/14 0 N/A M 1 ",
na.strings = 'N/A')
## format dates helper
f_dt <- function(x) {
x <- as.character(x)
res <- sprintf('01/%s/%s', substr(x, 1, 3), gsub('\\D', '', x))
as.Date(res, '%d/%b/%y')
}
dat <- within(dat, {
Visit <- f_dt(Visit)
Death <- f_dt(Death)
})
## remove those not dead and take the last row
## assign values how you want
deaths <- dat[with(dat, !is.na(Death) & !duplicated(ID, fromLast = TRUE)), ]
deaths <- within(deaths, {
Visit <- Death
State <- 5
})
## combine everything and order
out <- rbind(dat, deaths)
out[with(out, order(ID, Visit)), ]
# ID Visit Dead Death Sex State
# 1 101 2014-02-01 1 2015-01-01 M 2
# 2 101 2014-06-01 1 2015-01-01 M 3
# 3 101 2014-12-01 1 2015-01-01 M 2
# 31 101 2015-01-01 1 2015-01-01 M 5
# 4 102 2014-01-01 0 <NA> M 1
# 5 102 2014-04-01 0 <NA> M 1