Question

我希望在新列中添加以下数据集中的第一个特征

 mydf <- data.frame (customer= c(1,2,1,2,2,1,1) , feature =c("other", "a", "b", "c", "other","b", "c"))

    customer feature
1        1   other
2        2       a
3        1       b
4        2       c
5        2   other
6        1       b
7        1       c

使用dplyr。但是，我希望我的代码忽略数据集中的“其他”功能，并选择“其他”之后的第一个功能。

所以以下代码是不够的：

library (dplyr)    
new <- mydf %>%
  group_by(customer) %>%
  mutate(firstfeature = first(feature))

如何忽略“其他”以便达到以下理想输出：

    customer   feature   firstfeature

1        1   other            b
2        2       a            a
3        1       b            b
4        2       c            a
5        2   other            a
6        1       b            b

Answer 1

使用dplyr，我们可以按customer分组，并为每个组取第一个feature。

library(dplyr)
mydf %>%
   group_by(customer) %>%
   mutate(firstfeature = feature[feature != "other"][1])


#  customer feature firstfeature
#     <dbl>   <chr>        <chr>
#1        1   other            b
#2        2       a            a
#3        1       b            b
#4        2       c            a
#5        2   other            a
#6        1       b            b
#7        1       c            b

同样，我们也可以使用基础R ave

来完成此操作

mydf$firstfeature <- ave(mydf$feature, mydf$customer, 
                                         FUN= function(x) x[x!= "other"][1])

Answer 2

另一个选项是data.table

library(data.table)
setDT(mydf)[, firstfeature := feature[feature != "other"][1], customer]

使用dplyr第一个函数但忽略特定字符

2 个答案: