R-有条件地在重复的ID和索引中进行选择

时间:2018-12-13 01:36:10

标签: r

我有一个具有重复ID和不同变量的数据框,如下所示:

x <- 1:10
ID <- c(20,20,55,55,45,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- cbind(x, ID, fruit)

> df
X   ID   fruit
1   20   Orange
2   20   Apple
3   20   Pear
4   55   Apple
5   55   Blueberries
6   45   Apple
7   45   Banana
8   45   Banana
9   45   Strawberry
10  45   Pear

我需要根据层次结构(例如,橙色>蓝莓>梨>香蕉>苹果>草莓)有条件地索引某些属性,在重复的ID中 来获取:

X   ID   fruit
1   20   Orange
5   55   Blueberries
10  45   Pear

确实,我对如何执行此操作没有好的/简单的想法。有什么想法吗?

3 个答案:

答案 0 :(得分:3)

我们arrange {ID},基于'OP'帖子中指定的levels的'水果'和'X'以降序排列,然后按'ID'分组,{ {1}}第一行

slice

数据

library(dplyr)
df %>% 
  arrange(ID, factor(fruit, levels = c('Orange', 'Blueberries', 'Pear', 
             'Banana','Apple', 'Strawberry')), desc(X)) %>% 
  group_by(ID) %>% 
  slice(1)
# A tibble: 3 x 3
# Groups:   ID [3]
#      X    ID fruit      
#  <int> <int> <chr>      
#1     1    20 Orange     
#2    10    45 Pear       
#3     5    55 Blueberries

答案 1 :(得分:1)

假设您只希望每个组中的每一行,并且每个组都具有所需的fruit,我们可以创建一个单独的向量来存储层次结构,并使用mapply基于组将其作为子集。

hierarc_vec <- c("Orange","Blueberries", "Pear", "Banana","Apple","Strawberry")
ids <- unique(df$ID)

df[mapply(function(x, y) which.max(df$ID == x & df$fruit == y), 
                     ids, hierarc_vec[1:length(ids)]), ]


#    x ID       fruit
#1   1 20      Orange
#5   5 55 Blueberries
#10 10 45        Pear

数据

x <- 1:10
ID <- c(20,20,55,55,55,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", 
           "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- data.frame(x, ID, fruit)

答案 2 :(得分:0)

爱他们或恨他们,这就是设计因素要做的。

library('dplyr')

x <- 1:10
ID <- c(20,20,55,55,45,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- cbind(x, ID, fruit)

df %>%
    as.data.frame() %>%
    mutate(fruit = factor(
        fruit,
        levels = c('Orange','Blueberries','Pear','Banana','Apple','Strawberry'),
        ordered = T
    )) %>%
    group_by(ID) %>%
    arrange(fruit, ID) %>%
    slice(1)

# A tibble: 3 x 3
# Groups:   ID [3]
  x     ID    fruit      
  <fct> <fct> <ord>      
1 1     20    Orange     
2 5     45    Blueberries
3 3     55    Pear