匹配/替换与R

时间:2019-02-10 18:08:59

标签: r loops replace pattern-matching match

我首先要说我正在尝试学习r,但这对我来说并不容易。与这篇文章here类似,我试图匹配一个数据帧(df)中多列中的值,然后根据另一数据帧(df.key)中的相应列替换这些值。这是我的df示例:

name  type place ttotal t01 t02 t03 t04 t05 t06 t07 t08 t09
joe   cat  SE        7    3   2   2   3   2  5   2   0  1  
john  cat  SE        2    0   0   4   0   3  1   3   1  7
sue   cat  SE        1    2   0   5   0   4  1   4   3  0     
jack  cat  SE        6    3   4   2   2   4  0   2   1  5    

下面是我的df.key,用于将df $ ttotal列中的上述值与df.key $ class的t09匹配,并相应地替换为df.key $ mid中的值:

lo  hi class mid 
0    0    0  0.0
0    1    1  0.5
1    2    2  3.0    
5   10    3  7.5   
10  20    4 15.0 
20  30    5 25.0 
30  40    6 35.0 
40  50    7 45.0 

所以第一行应该是:

name  type place ttotal t01  t02  t03 t04 t05 t06  t07 t08 t09
 joe   cat  SE   45.0   7.5  3.0  3.0 7.5 3.0 25.0 3.0 0.0 0.5

这只是我尝试过的一个匹配循环,但它会在行中填充comeed值:

for(i in 1:dim(df)[1]){
  for(j in df$4:13) {
    df[i,j] <- df.key$mid[match(i, df.key$class)]
  }
}

感谢您的帮助。我想尝试获得与此类似的解决方案,希望我能理解。

2 个答案:

答案 0 :(得分:0)

可以做到:

library(tidyverse)

df %>%
  gather(key, val, ttotal:t09) %>%
  left_join(df.key %>% select(3:4), by = c("val" = "class")) %>%
  spread(key, mid) %>%
  group_by(name) %>%
  summarise_all(funs(first(na.omit(.)))) %>%
  select(-val)

输出:

# A tibble: 4 x 13
  name  type  place   t01   t02   t03   t04   t05   t06   t07   t08   t09 ttotal
  <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 jack  cat   SE      7.5    15     3   3    15     0     3     0.5  25     35  
2 joe   cat   SE      7.5     3     3   7.5   3    25     3     0     0.5   45  
3 john  cat   SE      0       0    15   0     7.5   0.5   7.5   0.5  45      3  
4 sue   cat   SE      3       0    25   0    15     0.5  15     7.5   0      0.5

答案 1 :(得分:0)

您可以简单地将密钥映射到您的数据中


library(tidyverse)

mutate_at(dat, vars(ttotal:t09), funs(map_dbl(., ~ keys$mid[keys$class == .x])))

哪个输出:

  name type place ttotal t01 t02 t03 t04  t05  t06  t07 t08  t09
1  joe  cat    SE   45.0 7.5   3   3 7.5  3.0 25.0  3.0 0.0  0.5
2 john  cat    SE    3.0 0.0   0  15 0.0  7.5  0.5  7.5 0.5 45.0
3  sue  cat    SE    0.5 3.0   0  25 0.0 15.0  0.5 15.0 7.5  0.0
4 jack  cat    SE   35.0 7.5  15   3 3.0 15.0  0.0  3.0 0.5 25.0

说明:

使用dplyr::mutate_at(),您可以更改通过vars(ttotal:t09)选择的变量的值,并将函数funs(...)应用于每个选定的变量。对于每个变量,map_dbl(., ~ keys$mid[keys$class == .x])将其与keys$class逐元素(key$class == .x)进行比较,并通过结果布尔向量将其与子集keys$mid进行比较。


您的数据:

dat <-
  structure(
    list(
      name = c("joe", "john", "sue", "jack"),
      type = c("cat",
               "cat", "cat", "cat"),
      place = c("SE", "SE", "SE", "SE"),
      ttotal = c(7L,
                 2L, 1L, 6L),
      t01 = c(3L, 0L, 2L, 3L),
      t02 = c(2L, 0L, 0L, 4L),
      t03 = c(2L, 4L, 5L, 2L),
      t04 = c(3L, 0L, 0L, 2L),
      t05 = c(2L,
              3L, 4L, 4L),
      t06 = c(5L, 1L, 1L, 0L),
      t07 = c(2L, 3L, 4L,
              2L),
      t08 = c(0L, 1L, 3L, 1L),
      t09 = c(1L, 7L, 0L, 5L)
    ),
    class = "data.frame",
    row.names = c(NA,-4L)
  )

keys <-
  structure(
    list(
      lo = c(0L, 0L, 1L, 5L, 10L, 20L, 30L, 40L),
      hi = c(0L,
             1L, 2L, 10L, 20L, 30L, 40L, 50L),
      class = 0:7,
      mid = c(0, 0.5,
              3, 7.5, 15, 25, 35, 45)
    ),
    class = "data.frame",
    row.names = c(NA,-8L)
  )