如何使用自定义函数在现有数据框架中创建新的二进制变量?

时间:2017-01-04 02:52:00

标签: r

我正在尝试创建一个自定义函数,在现有数据框中生成新的二进制变量。我们的想法是能够使用诊断描述(字符串),ICD9诊断代码(编号)和患者数据库来提供功能。然后,该函数将为所有感兴趣的诊断生成新变量,并且如果患者(行或观察者)具有诊断,则指定0或1。

以下是函数变量:

x<-c("2851") #ICD9 for Anemia
y<-c("diag_1") #Primary diagnosis 
z<-"Anemia"  #Name of new binary variable for patient dataframe
i<-patient_db #patient dataframe

patient<-c("a","b","c")
diag_1<-c("8661", "2851","8651")
diag_2<-c("8651","8674","2866")
diag_3<-c("2430","3456","9089")

patient_db<-data_frame(patient,diag_1,diag_2,diag_3)

  patient  diag_1 diag_2 diag_3
1       a  8661   8651   2430
2       b  2851   8674   3456
3       c  8651   2866   9089

以下是功能:

diagnosis_func<-function(x,y,z,i){

pattern = paste("^(", paste0(x, collapse = "|"), ")", sep = "")

i$z<-ifelse(rowSums(sapply(i[y], grepl, pattern = pattern)) != 0,"1","0")

}

这是我在运行该函数后想要得到的:

  patient  diag_1 diag_2 diag_3  Anemia
1       a  8661   8651   2430      0
2       b  2851   8674   3456      1
3       c  8651   2866   9089      0

函数内的行已经在函数外测试,并且是enter image description here。我被困在哪里试图让功能正常工作。任何帮助将不胜感激。

新年快乐

阿尔比特

1 个答案:

答案 0 :(得分:1)

如果您打算一次只使用一个诊断,这将有效。我冒昧地重命名参数,以便在代码中更容易使用。

diagnosis_func <- function(data, target_col, icd, new_col){
  pattern <- sprintf("^(%s)", 
                     paste0(icd, collapse = "|"))

  data[[new_col]] <- grepl(pattern = pattern, 
                           x = data[[target_col]]) + 0L
  data
}

diagnosis_func(patient_db, "diag_1", "2851", "Anemia")

# Multiple codes for a single diagnosis
diagnosis_func(patient_db, "diag_1", c("8661", "8651"), "Dx")

如果您想稍微修改一下以防止意外错误,可以安装checkmate包并使用此版本。这将

diagnosis_func <- function(data, target_col, icd, new_col){

  coll <- checkmate::makeAssertCollection()

  checkmate::assert_class(x = data,
                          classes = "data.frame",
                          add = coll)

  checkmate::assert_character(x = target_col,
                              len = 1,
                              add = coll)

  checkmate::assert_character(x = icd,
                              add = coll)

  checkmate::assert_character(x = new_col,
                              len = 1,
                              add = coll)

  checkmate::reportAssertions(coll)

  pattern <- sprintf("^(%s)", 
                     paste0(icd, collapse = "|"))

  data[[new_col]] <- grepl(pattern = pattern, 
                           x = data[[target_col]]) + 0L
  data
}

diagnosis_func(patient_db, "diag_1", "2851", "Anemia")