我有一个具有重复ID和不同变量的数据框,如下所示:
x <- 1:10
ID <- c(20,20,55,55,45,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- cbind(x, ID, fruit)
> df
X ID fruit
1 20 Orange
2 20 Apple
3 20 Pear
4 55 Apple
5 55 Blueberries
6 45 Apple
7 45 Banana
8 45 Banana
9 45 Strawberry
10 45 Pear
我需要根据层次结构(例如,橙色>蓝莓>梨>香蕉>苹果>草莓)有条件地索引某些属性,在重复的ID中 来获取:
X ID fruit
1 20 Orange
5 55 Blueberries
10 45 Pear
确实,我对如何执行此操作没有好的/简单的想法。有什么想法吗?
答案 0 :(得分:3)
我们arrange
{ID},基于'OP'帖子中指定的levels
的'水果'和'X'以降序排列,然后按'ID'分组,{ {1}}第一行
slice
library(dplyr)
df %>%
arrange(ID, factor(fruit, levels = c('Orange', 'Blueberries', 'Pear',
'Banana','Apple', 'Strawberry')), desc(X)) %>%
group_by(ID) %>%
slice(1)
# A tibble: 3 x 3
# Groups: ID [3]
# X ID fruit
# <int> <int> <chr>
#1 1 20 Orange
#2 10 45 Pear
#3 5 55 Blueberries
答案 1 :(得分:1)
假设您只希望每个组中的每一行,并且每个组都具有所需的fruit
,我们可以创建一个单独的向量来存储层次结构,并使用mapply
基于组将其作为子集。
hierarc_vec <- c("Orange","Blueberries", "Pear", "Banana","Apple","Strawberry")
ids <- unique(df$ID)
df[mapply(function(x, y) which.max(df$ID == x & df$fruit == y),
ids, hierarc_vec[1:length(ids)]), ]
# x ID fruit
#1 1 20 Orange
#5 5 55 Blueberries
#10 10 45 Pear
数据
x <- 1:10
ID <- c(20,20,55,55,55,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries",
"Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- data.frame(x, ID, fruit)
答案 2 :(得分:0)
爱他们或恨他们,这就是设计因素要做的。
library('dplyr')
x <- 1:10
ID <- c(20,20,55,55,45,45,45,45,45,45)
fruit <- c("Orange", "Apple", "Pear", "Apple", "Blueberries", "Apple", "Banana", "Banana", "Strawberry", "Pear")
df <- cbind(x, ID, fruit)
df %>%
as.data.frame() %>%
mutate(fruit = factor(
fruit,
levels = c('Orange','Blueberries','Pear','Banana','Apple','Strawberry'),
ordered = T
)) %>%
group_by(ID) %>%
arrange(fruit, ID) %>%
slice(1)
# A tibble: 3 x 3
# Groups: ID [3]
x ID fruit
<fct> <fct> <ord>
1 1 20 Orange
2 5 45 Blueberries
3 3 55 Pear