更新我的data.frame的方法是什么?

时间:2016-12-15 10:34:56

标签: r

我在更新我的data.frame时遇到了一些问题,我不知道如何这样做。我有一个更新我的data.frame的函数:

# Tries to update the data.frame.
updateTable <- function(sample) {
  sampleName = sample[1]
  sex = sample[2]
  dob = sample[3]

  cat("UPDATE ENTRY:
     Current sample: ",sampleName,"
     Sex           : ",sex,"
     Day of Birth  : ",dob," (yyyy-mm-dd)
     ")
  age = getAge(as.Date(dob))
  occurences = which(test_data[,"Name"] == sampleName)
  test_data[occurences,"Age"] <- age
  test_data[occurences,"Sex"] <- sex

  # I tried this, but it returns n number of data.frames in the    test_patients list.
  #return(test_data)

  # And this returns a list with data.frames for each test_patient.
  #test_data[which(sampleName == test_data$Name),] <-c(Name=sampleName, Sex=sex, Age=age)

  # I want to return one data.frame, containing the updated information for each test_patient.

}

一个函数,用于计算给出其出生日期和当前日期的人的年龄:

# Calculates the age of a person given his/her birthdate.
getAge <- function(dob)
{
  currentDate = as.Date("2016-12-14")
  lt <- data.frame(dob, currentDate)
  age <- as.numeric(format(lt[,2],format="%Y")) - as.numeric(format(lt[,1],format="%Y"))

  dayOncurrentDateYear <- ifelse(format(lt[,1],format="%m-%d")!="02-29",
                           as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")),
                           ifelse(as.numeric(format(currentDate,format="%Y")) %% 400 == 0 | as.numeric(format(currentDate,format="%Y")) %% 100 != 0 & as.numeric(format(currentDate,format="%Y")) %% 4 == 0,
                                  as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")),
                                  as.Date(paste(format(lt[,2],format="%Y"),"-","02-28",sep=""))))

  age[which(dayOncurrentDateYear > lt$currentDate)] <- age[which(dayOncurrentDateYear > lt$currentDate)] - 1

  return(age)
}

现在输入数据:

test_data <- data.frame(Name=c("Anita", "Bert", "Cornel"), Sex=c(NA), Age=c(NA))
test_patients <- list( c("Anita", 0, "2000-01-01"), c("Bert", 1, "1959-01-01"), c("Cornel", 1, "1960-01-01") )
test_data = lapply(test_patients, updateTable)

现在我对如何实现目标有一些想法,但是我想知道这样做的方法是什么?我对R不是很有经验,没有书,我想我为什么不在这里问我的问题。

  1. 将此函数扩展为lapply作为嵌套函数,每个人都有一个tmp.df 然后在返回整个data.frame之前将转换后的tmp.df作为行添加到test_data。
  2. 类似的东西(这不起作用):

    test_data = lapply(test_patients, function(patient) {
      tmp.df = NULL 
      tmp.df = updateTable(patient)
      rbind.data.frame(test_data[which(sampleName == test_data$Name),], tmp.df)
    })
    
    1. 通过&lt;&lt; - 或者assign(“a”,“new”,envir = .GlobalEnv)从此函数设置data.frame。 (就像我在其他帖子上看到的那样,非常难看。)
    2. 所以,亲爱的互联网。谁可以教我如何处理这件事?

      亲切的问候

      编辑:

      test_data是我原始的子集,所以这里是一个稍微扩展的子集。我正在努力将denrou's answer测试用例的答案与我自己的数据联系起来。

      test_data <- data.frame(Name=c(rep(c("Anita", "Bert", "Cornel"),4)), Sex=c(NA), Age=c(NA), Sample_ID=c(rep(1:12,1)), Time=c(rep(1:4,3)) )
      test_data[order(test_data[,"Time"]),]
      
           Name Sex Age Sample_ID Time
      1   Anita  NA  NA         1    1
      5    Bert  NA  NA         5    1
      9  Cornel  NA  NA         9    1
      2    Bert  NA  NA         2    2
      6  Cornel  NA  NA         6    2
      10  Anita  NA  NA        10    2
      3  Cornel  NA  NA         3    3
      7   Anita  NA  NA         7    3
      11   Bert  NA  NA        11    3
      4   Anita  NA  NA         4    4
      8    Bert  NA  NA         8    4
      12 Cornel  NA  NA        12    4
      

      那么,如果我不是在创建基于test_patients向量的新data.frame但是希望test_data data.frame中存储test_patients的信息之后如何实现她/他的答案?

      自从做:

      library(dplyr)
      test_patients <- list( c("Anita", 0, "2000-01-01"), c("Bert", 1, "1959-01-01"), c("Cornel", 1, "1960-01-01") )
      
      # This function take a vector and returns a dataframe
      to_dataframe <- function(info) data_frame(Name = info[1], Sex = info[2], Birthdate = info[3])
      
      # Now I can turn your patient list into a dataframe
      test_data <- lapply(test_patients, to_dataframe) %>% bind_rows()
      
      # And I can calculate the age of a patient with your function
      test_data <- test_data %>% 
        mutate(Age = getAge(as.Date(Birthdate))) %>% 
        select(Name, Sex, Age) 
      

      导致:

      # A tibble: 3 × 3
          Name   Sex   Age
         <chr> <chr> <dbl>
      1  Anita     0     9
      2   Bert     1     9
      3 Cornel     1     9
      

      如果我不再有意义,请告诉我..我发现很难描述这些事情..

      EDIT2:

      对于任何在这个问题上磕磕绊绊的人,寻找答案; 我不能给你一个申请家庭功能的人,但这里有什么可以帮助你:

      for (patient in test_patients) {
      
        # Set variables.
        sampleName = patient[1]
        sex = patient[2]
        dob = patient[3]
        age = getAge(as.Date(dob))
      
        # Set reference.
        occurrences = which(test_data$Name == sampleName)
      
        # Update table.
        test_data[occurrences,"Age"] <- age
        test_data[occurrences,"Sex"] <- sex
      }
      

1 个答案:

答案 0 :(得分:0)

我就是这样做的:

library(dplyr)
test_patients <- list( c("Anita", 0, "2000-01-01"), c("Bert", 1, "1959-01-01"), c("Cornel", 1, "1960-01-01") )

# This function take a vector and returns a dataframe
to_dataframe <- function(info) data_frame(Name = info[1], Sex = info[2], Birthdate = info[3])

# Now I can turn your patient list into a dataframe
test_data <- lapply(test_patients, to_dataframe) %>% bind_rows()

# And I can calculate the age of a patient with your function
test_data <- test_data %>% 
  mutate(Age = getAge(as.Date(Birthdate))) %>% 
  select(Name, Sex, Age) 

修改

最后一个命令,即select(Name, Sex, Age)选择参数中给出的列(参见?dplyr::select)。您可以完美地修改此选项,以便只选择所需的列或将其删除,以便保留所有内容:

test_data <- test_data %>% 
  mutate(Age = getAge(as.Date(Birthdate)))