Question

我有一个数据集a，如下所示

       Dictionary      ActMin   ActMax
             3145      5        10
             32441     10       19
             3245      25       32
             416356    37       46
             4H22      82       130
             %ABC      1        27

我有另一个数据集b，如下所示

             ID        Test         Obs     Year
             1         3145-MN      11      1994  
             2         3145-NY      17      1992
             1         416356-FL    57      1995
             1         32441-MN     13      1995
             2         3145-MN      8       1993
             2         3245-NY      27      1983
             3         3245-FL      45      2003
             2         3145-MN      6       2001
             3         %ABC-NY      33      1996
             4         4H22-TX      97      1984

我要做的是像这样产生output

            Id         Test         Obs     Results   Year   Description 
            1          3145-MN      11      High      1994   Tested 3145 High on 1994, 4163 High on 1995,    
            2          3145-NY      17      High      1992   Tested 3145 High on 1992
            1          416356-FL    57      High      1995
            1          32441-MN     13      Normal    1995
            2          3145-MN      8       Normal    1993
            2          3245-NY      27      Normal    1983
            3          3245-FL      45      High      2003   Tested 3245 High on 2003
            2          3145-MN      6       Normal    2001
            3          %ABC-NY      33      High      1996
            4          4H22-TX      27      Normal    1984

第一个数据集a是一个字典，用于存储唯一的测试号3145，3244等及其Minimum和Maximum值

第二个数据集b是实际测试结果数据集，用于存储实际观察到的结果。将b中特定测试的观察值与数据集a中的最小值和最大值进行比较。如果b中的观察值大于a中的实际最小值和最大值，则结果列应更新为high，否则为Normal。 description列应提供每个ID列出的测试摘要（每个ID的1个摘要）。

需要有关此复杂循环以及if语句和结果聚合的帮助。

Answer 1

有点费解，但结果应该与你提出的相似。我设法在基础R中获得result列，但对于Description，我必须使用data.table。

 b$result<-c("Normal","High")[(b$Obs > a$ActMax[match(substr(b$Test,1,4),as.character(a$Dictionary))])+1]
 require(data.table)
 setDT(b)
 b[,Description:=gsub("(, )+$","",c(paste(ifelse(result=="High",paste("Tested",substring(Test,1,4),result,"on",Year),""),collapse=", "),rep("",.N-1))),by=ID]

Answer 2

通过使用dplyr，可以发现代码更具可读性：

library(dplyr)
df_result <-
  b %>%
  ## EDIT mutate( Dictionary = as.numeric(substring(Test,1,4)) ) %>%  
  mutate( Dictionary = as.numeric( gsub("[A-Z,-]+", "", Test )) ) %>%  
  inner_join(a, by = "Dictionary") %>%
  mutate( Results = ifelse( Obs > pmax(ActMin, ActMax), yes = "High", no = "Normal" )) 

df_description <-
  df_result %>%
  filter( Results == "High") %>%
  group_by(ID) %>%
  summarise( 
    Results = Results[1],
    Dictionary = min(Dictionary),
    Description = paste("Tested", Dictionary, "on", Year,collapse = ","))

df_final <- 
  df_result %>%
  left_join( df_description, by = c("ID","Dictionary", "Results")) %>%
  select(ID, Test, Obs, Results, Year, Description)

嵌套循环和多个if语句R.

2 个答案: