Dplyr R - 使用distinct()或完全不同的东西时的多个条件?

时间:2016-08-21 06:10:15

标签: r conditional distinct

我要说的是,我并没有完全依赖于使用distinct()解决我的问题,我对所有解决问题的建议持开放态度。这是拼图:

Date <- c(1,1,2,2)
Group <- c("A","A","B","B")
Result <- c("Aa","Ab","Aa","SB")
df <- cbind(Date, Group, Result)
df
     Date Group Result
[1,] "1"  "A"   "Aa"  
[2,] "1"  "A"   "Ab"  
[3,] "2"  "B"   "Aa"  
[4,] "2"  "B"   "SB" 

我瞄准的结果是不同的Date,因此选择包含Aa或Ab的任一行(子集),并且选择包含SB的任何行超过Aa或Ab或Ac要么 ... 。我以高效的方式为大型数据框执行此操作时遇到了很多麻烦。我没有质量尝试在这里展示。

实际上,Group A和B有更多基于时间的观察,还有更多不同的群体。对于某个特定Date,在同一个Group上上传两次(或更多)数据时,实际上应该只有一个Date条目,其中Result更重要。

更新:

过滤后的上述输出的预期子集等:

     Date Group Result
[1,] "1"  "A"   "Aa"    
[2,] "2"  "B"   "SB" 

OR

     Date Group Result
[1,] "1"  "A"   "Ab"    
[2,] "2"  "B"   "SB" 

2 个答案:

答案 0 :(得分:0)

使用dplyr,但不是distinct

library(dplyr)

Date <- c(1,1,2,2)
Group <- c("A","A","B","B")
Result <- c("Aa","Ab","Aa","SB")
# Use data.frame, not cbind, as this produced a matrix
df <- data.frame(Date, Group, Result)

# To get your first answer
summarise(group_by(df, Date, Group), 
                   Result = first(Result))

# To get your second answer
summarise(group_by(df, Date, Group), 
                   Result = last(Result))

# To combine all the options
summarise(group_by(df, Date, Group), 
                   Result = paste(Result, collapse = ", "))

答案 1 :(得分:0)

独特的结果需要按重要性排序。这可以手动完成或使用某种算法完成。两种方法如下所示。排名结果随后用于查找每个日期组组合的排名最高的结果。代码可能如下所示:

  library(dplyr)
  df <- data.frame(df)
#
# manually list unique Results in order of increasing importance
#
  Result_rank <- c("Aa","Ab","SB")
#
# Or use an algorithm to rank unique Results in order of importance;
# For the example, the algorithm might be:
#
  Result_rank <- c(grep("^A",unique(df$Result), value=TRUE), 
                   grep("SB",unique(df$Result), value=TRUE))
#
# summarize by highest ranked Result for each Date and Group
#
  df_important <- df %>% group_by( Date, Group) %>%
                  summarize(Result= Result_rank[max(match(Result, Result_rank))]) 

给出结果

   Date  Group Result
  <fctr> <fctr>  <chr>
1      1      A     Ab
2      2      B     SB