重复时如何基于优先级重新分类/替换值

时间:2019-03-14 00:44:09

标签: r dplyr plyr r-factor

我有一个df,其中value表示drug的状态:

g1 = data.frame ( 
    drug = c('a','a','a','d','d'),
    value = c('fda','trial','case','case','pre')
)

drug value
1    a   fda
2    a trial
3    a  case
4    d  case
5    d   pre

对于药物,我想根据drug的以下优先顺序替换任何重复的value

fda > trial > case > pre 

因此,例如,如果药物d既是“ case”又是“ pre”,则d的所有发生率都将重新分类为“ case”。决赛桌应如下所示。

  drug value
1    a   fda
2    a   fda
3    a   fda
4    d  case
5    d  case

如何做到这一点而又不必遍历每种药物并先弄清优先顺序,然后替换呢?

3 个答案:

答案 0 :(得分:3)

使用映射MenuItem进行更新,因为我不想更改列类型,所以我曾经这样做。

Icon

答案 1 :(得分:3)

类似于@ Wen-Ben的答案,使用base函数,您还可以执行以下操作:

g1$value <- factor(g1$value, levels = c("fda", "trial", "case", "pre"))
g1 <- g1[order(g1$value),]
g1$value <- g1[match(g1$drug, g1$drug), "value"]

答案 2 :(得分:3)

由于这是一个序数变量,因此可以将g1$value设为ordered因子,作为对应的class。然后,您可以像使用数字一样使用minmax之类的函数:

g1$value <- ordered(g1$value, levels = c("fda", "trial", "case", "pre"))
g1$value
#[1] fda   trial case  case  pre  
#Levels: fda < trial < case < pre
g1$value <- ave(g1$value, g1$drug, FUN=min)
g1
#  drug value
#1    a   fda
#2    a   fda
#3    a   fda
#4    d  case
#5    d  case

或在 dplyr 中说:

g1 %>%
  mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
  group_by(drug) %>%
  mutate(value = min(value))

数据集中的顺序和任何drug组中存在的值的范围都不会影响此结果:

g2 = data.frame ( 
    drug = c( "a","a","a","d","d","e","e","e"),
    value = c("fda","trial","case","case","pre","pre","fda","case")
)

#  drug value
#1    a   fda
#2    a trial
#3    a  case
#4    d  case
#5    d   pre
#6    e   pre
#7    e   fda
#8    e  case

g2 %>%
  mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
  group_by(drug) %>%
  mutate(value = min(value))

## A tibble: 8 x 2
## Groups:   drug [3]
#  drug  value
#  <fct> <ord>
#1 a     fda  
#2 a     fda  
#3 a     fda  
#4 d     case 
#5 d     case 
#6 e     fda  
#7 e     fda  
#8 e     fda