根据前面行中数据的值向数据集中的患者添加行

时间:2015-01-30 15:41:35

标签: r dataframe dataset

我有一个以这种方式构建的数据集:

Patient ID    Visit Date         Dead     Death Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 

如果患者已经死亡,所有访问将被标记为“死亡”代码和死亡日期。

如果死亡代码是= 1

我需要创建一行作为患者101的最后一次访问

死亡日期在“访问日期栏”

和“State”变量指示5(我的数据集中的死亡状态代码)。

我想要的数据集看起来像这样(第四行数据是重要数据集):

Patient ID    Visit Date         Dead     Death Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
101           Jan/15             1          Jan/15      M      5 
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1   

3 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

df <- read.table(header=T, text='Patient_ID    Visit_Date         Dead     Death_Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ', stringsAsFactors=F)

df$Patient_ID <- as.numeric(df$Patient_ID) #this needs to be numeric

df <- rbind(df, list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )) #use rbind to add a row

> df[order(df$Patient_ID),] #sort on Patient ID and the last row is inserted where it should
  Patient_ID  Visit_Date Dead Death_Date Sex State
1        101      Feb/14    1     Jan/15   M     2
2        101     June/14    1     Jan/15   M     3
3        101 December/14    1     Jan/15   M     2
6        101      Jan/15    1     Jan/15   M     5
4        102      Jan/14    0        N/A   M     1
5        102    April/14    0        N/A   M     1

因此,您真正需要使用的唯一事情是使用rbind函数,该函数在data.frame的末尾添加一行。将其用作rbind( <your_data.frame> , <a vector with the values to add>).在我们的案例中,<your data frame>为df,<a vector with the values to add>list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )

最好使用列表向量添加行,因为这样可以确保data.frame的列类型保持不变。使用原子向量会将所有内容强制转换为角色。

答案 1 :(得分:0)

数据表回答:

df <- read.table(header=T, text='Patient_ID    Visit_Date         Dead     Death_Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ', stringsAsFactors=F)

library(data.table)
DT <- as.data.table(df)
# take only the Patient_ID, Death indicator, Death date and sex 
dead <- unique(DT[ Death_Date != "N/A", c(1, 3, 4, 5), with = FALSE, ])

# move the death date to visited, assign '5' to state
dead[, c("Visit_Date", "State") := list(Death_Date, 5)  ]

# recombine with original records
records <- rbind(DT, dead)
records[ order(records$Patient_ID, as.Date(records$Visit_Date, format = "%b/%d")),]
   Patient_ID  Visit_Date Dead Death_Date Sex State
1:        101      Jan/15    1     Jan/15   M     5
2:        101      Feb/14    1     Jan/15   M     2
3:        101     June/14    1     Jan/15   M     3
4:        101 December/14    1     Jan/15   M     2
5:        102      Jan/14    0        N/A   M     1
6:        102    April/14    0        N/A   M     1 

答案 2 :(得分:0)

有几件事正在发生。首先,您应该使用NA而不是字符串。其次,您应该格式化这些日期,以便您可以使用它们(并按正确排序)。

dat <- read.table(header = TRUE, text = "ID    Visit         Dead     Death    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ",
                  na.strings = 'N/A')

## format dates helper
f_dt <- function(x) {
  x <- as.character(x)
  res <- sprintf('01/%s/%s', substr(x, 1, 3), gsub('\\D', '', x))
  as.Date(res, '%d/%b/%y')
}

dat <- within(dat, {
  Visit <- f_dt(Visit)
  Death <- f_dt(Death)
})

## remove those not dead and take the last row
## assign values how you want
deaths <- dat[with(dat, !is.na(Death) & !duplicated(ID, fromLast = TRUE)), ]
deaths <- within(deaths, {
  Visit <- Death
  State <- 5
})

## combine everything and order
out <- rbind(dat, deaths)
out[with(out, order(ID, Visit)), ]

#     ID      Visit Dead      Death Sex State
# 1  101 2014-02-01    1 2015-01-01   M     2
# 2  101 2014-06-01    1 2015-01-01   M     3
# 3  101 2014-12-01    1 2015-01-01   M     2
# 31 101 2015-01-01    1 2015-01-01   M     5
# 4  102 2014-01-01    0       <NA>   M     1
# 5  102 2014-04-01    0       <NA>   M     1