我有一个df,其中value
表示drug
的状态:
g1 = data.frame (
drug = c('a','a','a','d','d'),
value = c('fda','trial','case','case','pre')
)
drug value
1 a fda
2 a trial
3 a case
4 d case
5 d pre
对于药物,我想根据drug
的以下优先顺序替换任何重复的value
:
fda > trial > case > pre
因此,例如,如果药物d既是“ case”又是“ pre”,则d的所有发生率都将重新分类为“ case”。决赛桌应如下所示。
drug value
1 a fda
2 a fda
3 a fda
4 d case
5 d case
如何做到这一点而又不必遍历每种药物并先弄清优先顺序,然后替换呢?
答案 0 :(得分:3)
使用映射MenuItem
进行更新,因为我不想更改列类型,所以我曾经这样做。
Icon
答案 1 :(得分:3)
类似于@ Wen-Ben的答案,使用base
函数,您还可以执行以下操作:
g1$value <- factor(g1$value, levels = c("fda", "trial", "case", "pre"))
g1 <- g1[order(g1$value),]
g1$value <- g1[match(g1$drug, g1$drug), "value"]
答案 2 :(得分:3)
由于这是一个序数变量,因此可以将g1$value
设为ordered
因子,作为对应的class
。然后,您可以像使用数字一样使用min
和max
之类的函数:
g1$value <- ordered(g1$value, levels = c("fda", "trial", "case", "pre"))
g1$value
#[1] fda trial case case pre
#Levels: fda < trial < case < pre
g1$value <- ave(g1$value, g1$drug, FUN=min)
g1
# drug value
#1 a fda
#2 a fda
#3 a fda
#4 d case
#5 d case
或在 dplyr 中说:
g1 %>%
mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
group_by(drug) %>%
mutate(value = min(value))
数据集中的顺序和任何drug
组中存在的值的范围都不会影响此结果:
g2 = data.frame (
drug = c( "a","a","a","d","d","e","e","e"),
value = c("fda","trial","case","case","pre","pre","fda","case")
)
# drug value
#1 a fda
#2 a trial
#3 a case
#4 d case
#5 d pre
#6 e pre
#7 e fda
#8 e case
g2 %>%
mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
group_by(drug) %>%
mutate(value = min(value))
## A tibble: 8 x 2
## Groups: drug [3]
# drug value
# <fct> <ord>
#1 a fda
#2 a fda
#3 a fda
#4 d case
#5 d case
#6 e fda
#7 e fda
#8 e fda