Question

我有一个以这种方式构建的数据集：

Patient ID    Visit Date         Dead     Death Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1

如果患者已经死亡，所有访问将被标记为“死亡”代码和死亡日期。

如果死亡代码是= 1

我需要创建一行作为患者101的最后一次访问

死亡日期在“访问日期栏”

和“State”变量指示5（我的数据集中的死亡状态代码）。

我想要的数据集看起来像这样（第四行数据是重要数据集）：

Patient ID    Visit Date         Dead     Death Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
101           Jan/15             1          Jan/15      M      5 
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1

Answer 1

您可以执行以下操作：

df <- read.table(header=T, text='Patient_ID    Visit_Date         Dead     Death_Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ', stringsAsFactors=F)

df$Patient_ID <- as.numeric(df$Patient_ID) #this needs to be numeric

df <- rbind(df, list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )) #use rbind to add a row

> df[order(df$Patient_ID),] #sort on Patient ID and the last row is inserted where it should
  Patient_ID  Visit_Date Dead Death_Date Sex State
1        101      Feb/14    1     Jan/15   M     2
2        101     June/14    1     Jan/15   M     3
3        101 December/14    1     Jan/15   M     2
6        101      Jan/15    1     Jan/15   M     5
4        102      Jan/14    0        N/A   M     1
5        102    April/14    0        N/A   M     1

因此，您真正需要使用的唯一事情是使用rbind函数，该函数在data.frame的末尾添加一行。将其用作rbind( <your_data.frame> , <a vector with the values to add>).在我们的案例中，<your data frame>为df，<a vector with the values to add>为list(101, 'Jan/15', 1, 'Jan/15', 'M', 5 )。

最好使用列表向量添加行，因为这样可以确保data.frame的列类型保持不变。使用原子向量会将所有内容强制转换为角色。

Answer 2

数据表回答：

df <- read.table(header=T, text='Patient_ID    Visit_Date         Dead     Death_Date    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ', stringsAsFactors=F)

library(data.table)
DT <- as.data.table(df)
# take only the Patient_ID, Death indicator, Death date and sex 
dead <- unique(DT[ Death_Date != "N/A", c(1, 3, 4, 5), with = FALSE, ])

# move the death date to visited, assign '5' to state
dead[, c("Visit_Date", "State") := list(Death_Date, 5)  ]

# recombine with original records
records <- rbind(DT, dead)
records[ order(records$Patient_ID, as.Date(records$Visit_Date, format = "%b/%d")),]
   Patient_ID  Visit_Date Dead Death_Date Sex State
1:        101      Jan/15    1     Jan/15   M     5
2:        101      Feb/14    1     Jan/15   M     2
3:        101     June/14    1     Jan/15   M     3
4:        101 December/14    1     Jan/15   M     2
5:        102      Jan/14    0        N/A   M     1
6:        102    April/14    0        N/A   M     1

Answer 3

有几件事正在发生。首先，您应该使用NA而不是字符串。其次，您应该格式化这些日期，以便您可以使用它们（并按正确排序）。

dat <- read.table(header = TRUE, text = "ID    Visit         Dead     Death    Sex    State
101           Feb/14             1          Jan/15      M      2
101           June/14            1          Jan/15      M      3 
101           December/14        1          Jan/15      M      2
102           Jan/14             0          N/A         M      1
102           April/14           0          N/A         M      1 ",
                  na.strings = 'N/A')

## format dates helper
f_dt <- function(x) {
  x <- as.character(x)
  res <- sprintf('01/%s/%s', substr(x, 1, 3), gsub('\\D', '', x))
  as.Date(res, '%d/%b/%y')
}

dat <- within(dat, {
  Visit <- f_dt(Visit)
  Death <- f_dt(Death)
})

## remove those not dead and take the last row
## assign values how you want
deaths <- dat[with(dat, !is.na(Death) & !duplicated(ID, fromLast = TRUE)), ]
deaths <- within(deaths, {
  Visit <- Death
  State <- 5
})

## combine everything and order
out <- rbind(dat, deaths)
out[with(out, order(ID, Visit)), ]

#     ID      Visit Dead      Death Sex State
# 1  101 2014-02-01    1 2015-01-01   M     2
# 2  101 2014-06-01    1 2015-01-01   M     3
# 3  101 2014-12-01    1 2015-01-01   M     2
# 31 101 2015-01-01    1 2015-01-01   M     5
# 4  102 2014-01-01    0       <NA>   M     1
# 5  102 2014-04-01    0       <NA>   M     1

根据前面行中数据的值向数据集中的患者添加行

3 个答案: