使用dplyr的数据框中的条件替换元素

时间:2018-09-21 19:58:40

标签: r dplyr

head(df)

    genotype.og label   genotype
MT1 MT1 MT
MT2 MT2 MT
MT3 MT3 MT
MT4 MT4 MT
MT5 MT5 MT
MT6 MT6 MT
WT1 WT1 WT
WT4 WT4 WT
WT11    WT11    WT
WT13    WT13    WT
WT27    WT27    WT
WT28    WT28    WT
WT74    WT74    WT
WT53    WT53    WT
WT68    WT68    WT
WT84    WT84    WT
WT92    WT92    WT
WT95    WT95    WT

在行的某处,我有WT1..WTn行。尝试仅更改标签列中的几个元素,而不更改其他元素。制作了一个名为old的变量,其中包含需要替换/重命名的元素的名称。

代码

old = c("WT1","WT4","WT11","WT13","WT27","WT28","WT74","WT53","WT68","WT84","WT92","WT95")
new = c("WS1","WS4","WS11","WS13","WS27","WS28","WS74","WS53","WS68","WS84","WS92","WS95")
df$label_new <- df$label %>% rename_if(vars(old), ~ new)

错误

Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "character" 

所需的输出

genotype.og label   genotype    label_new
MT1 MT1 MT  MT1
MT2 MT2 MT  MT2
MT3 MT3 MT  MT3
MT4 MT4 MT  MT4
MT5 MT5 MT  MT5
MT6 MT6 MT  MT6
WT1 WT1 WT  WS1
WT4 WT4 WT  WS4
WT11    WT11    WT  WS11
WT13    WT13    WT  WS13
WT27    WT27    WT  WS27
WT28    WT28    WT  WS28
WT74    WT74    WT  WS74
WT53    WT53    WT  WS53
WT68    WT68    WT  WS68
WT84    WT84    WT  WS84
WT92    WT92    WT  WS92
WT95    WT95    WT  WS95

我想念什么?

2 个答案:

答案 0 :(得分:2)

使用str_replace_all中的stringrstr_replace_all采用命名的向量进行匹配和替换,其中名称是要匹配的模式,值是替换。 ^$正则表达式元字符被包装在每个模式上,以确保它们完全匹配:

library(stringr)
library(dplyr)

df %>%
  mutate(label_new = str_replace_all(label, setNames(new, paste0('^', old, '$'))))

查找字符串变为:

> setNames(new, paste0('^', old, '$'))
 ^WT1$  ^WT4$ ^WT11$ ^WT13$ ^WT27$ ^WT28$ ^WT74$ ^WT53$ ^WT68$ ^WT84$ ^WT92$ ^WT95$ 
 "WS1"  "WS4" "WS11" "WS13" "WS27" "WS28" "WS74" "WS53" "WS68" "WS84" "WS92" "WS95"

或带有基数R:

df$label_new <- df$label
label_match <- match(df$label_new, old)
df$label_new[!is.na(label_match)] <- new[na.omit(label_match)]

输出:

   genotype.og label genotype label_new
1          MT1   MT1       MT       MT1
2          MT2   MT2       MT       MT2
3          MT3   MT3       MT       MT3
4          MT4   MT4       MT       MT4
5          MT5   MT5       MT       MT5
6          MT6   MT6       MT       MT6
7          WT1   WT1       WT       WS1
8          WT4   WT4       WT       WS4
9         WT11  WT11       WT      WS11
10        WT13  WT13       WT      WS13
11        WT27  WT27       WT      WS27
12        WT28  WT28       WT      WS28
13        WT74  WT74       WT      WS74
14        WT53  WT53       WT      WS53
15        WT68  WT68       WT      WS68
16        WT84  WT84       WT      WS84
17        WT92  WT92       WT      WS92
18        WT95  WT95       WT      WS95

数据:

df <- structure(list(genotype.og = c("MT1", "MT2", "MT3", "MT4", "MT5", 
"MT6", "WT1", "WT4", "WT11", "WT13", "WT27", "WT28", "WT74", 
"WT53", "WT68", "WT84", "WT92", "WT95"), label = c("MT1", "MT2", 
"MT3", "MT4", "MT5", "MT6", "WT1", "WT4", "WT11", "WT13", "WT27", 
"WT28", "WT74", "WT53", "WT68", "WT84", "WT92", "WT95"), genotype = c("MT", 
"MT", "MT", "MT", "MT", "MT", "WT", "WT", "WT", "WT", "WT", "WT", 
"WT", "WT", "WT", "WT", "WT", "WT")), .Names = c("genotype.og", 
"label", "genotype"), class = "data.frame", row.names = c(NA, 
-18L))

答案 1 :(得分:1)

我更喜欢@avid_useR的解决方案,但这是一种显示映射过程的方法。

library(dplyr)
df <- df %>% 
  # Join with the replacement strings; NA where no replacement
  left_join(data_frame(old, new), by = c("label" = "old")) %>%
  # Update label to use replacement where available
  mutate(label = if_else(is.na(new), label, new)) %>%
  select(-new)