Question

我在R中有一组10年的贡献数据。美元价值按ID＃（给予礼物的人）和给定年份分组。每年都没有送给每个人的礼物。对于每一行，我想指出行（礼物）是否是第一个贡献（从未给定），如果它与上一年相同，大于上一年，小于上一年，如果没有在前一年的礼物（但前一年有礼物）。另外，我想说明赠送此礼物的人是否在下一年没有赠送礼物。

因此，如果数据如下所示：

ID#          YEAR          GIFT
1               2005          $10
1               2006          $5
1               2008          $15
1               2009          $20
1               2010          $20


the result should be:

ID#          YEAR          GIFT          STATUS
1               2005          $10          FIRST
1               2006          $5           LOWER         also    NO NEXT YEAR
1               2008          $15          PREVIOUS GIVER
1               2009          $20          HIGHER
1               2010          $20          SAME

谢谢！

Answer 1

使用dplyr的解决方案和确定结果的函数，并使代码更清晰。数据：

data <- read.table(text="ID          YEAR          GIFT
1               2005          $10
1               2006          $5
1               2008          $15
1               2009          $20
1               2010          $20", header=TRUE)

为了获得您想要的输出，我们必须将每个值（this）与之前的{prev），下一个（follow）进行比较，并检查如果它是该群组的first或last。

getStatus <- function(first, prev, this, follow, last) {  
  if (first) {
    status <- "FIRST" #Easy one
  } else if (length(prev) < 1 || is.na(prev)) { #Not the first, but prev missing
    status <- "PREVIOUS GIVER"
  } else if (this < prev) { #The next 3 are obvious
    status <- "LOWER"
  } else if (this == prev) {
    status <- "SAME"
  } else if(this > prev) {
    status <- "HIGHER"
  }
  if ((length(follow) < 1 || is.na(follow)) & !last) { #No next but isn't last
    status <- paste(status, "also NO NEXT YEAR")
  }  
  return(status)
}

现在我们有了我们的功能，我们必须处理数据。我们将使用dplyr使事情更具可读性。

library(dplyr)

result <- data %>% group_by(ID) %>% 
  arrange(YEAR) %>% #We make sure YEAR is sorted ascending
  mutate(gift.num = GIFT %>% gsub("\\$", "", .) %>% as.numeric) %>% #Create a column with the gifts as numbers
  mutate(RESULT = sapply(YEAR, function(y) { 
  #Apply getStatus passing the corresponding arguments to create RESULT
    getStatus(.$YEAR %>% first == y, .$gift.num[which(.$YEAR==y-1)],
              .$gift.num[which(.$YEAR==y)], .$gift.num[which(.$YEAR==y+1)],
              .$YEAR %>% last == y)
  })) %>%
  select(-gift.num) #Removing the dummy column

这给了我们：

  ID YEAR GIFT                  RESULT
1  1 2005  $10                   FIRST
2  1 2006   $5 LOWER also NO NEXT YEAR
3  1 2008  $15          PREVIOUS GIVER
4  1 2009  $20                  HIGHER
5  1 2010  $20                    SAME

更多数据会更好地确保涵盖所有方案，但即使不是，您应该能够修复任何错误。

根据年度变化值在R中创建新列

1 个答案: