用另一列中的值替换数据框中的占位符值

时间:2019-12-27 21:17:39

标签: r dplyr

我有一个看起来像这样的数据框:

df <-
  structure(
    list(
      Exception1 = c(
        "Comments from {2}: {0}",
        "status updated to {1} by {2}. Description:{0}",
        "status updated to {1} by {2}. Description:{0}",
        "information only.",
        "status updated to {1} by {2}. Description:{0}",
        "status updated to {1} by {2}. Description:{0}"
      ),
      Exception2 = c(
        "Customer {0} said bla",
        "Status updated to {1}",
        "Customer said {2}",
        "User {0} foo",
        "{0} {1}",
        "{1} {2}"
      ),
      ARGUMENT1 = c("OK", " ", " ", "PAY9089723089-98391", " ", " "),
      ARGUMENT2 = c(
        "null",
        "Processing",
        "Reconciled",
        "null",
        "Processing",
        "Reconciled"
      ),
      ARGUMENT3 = c(
        "company name",
        "company name",
        "company name",
        "null",
        "company name",
        "company name"
      )
    ),
    row.names = c(NA, 6L),
    class = "data.frame"
  )

| Exception1                                    | Exception2            | ARGUMENT1           | ARGUMENT2  | ARGUMENT3    |
|-----------------------------------------------|-----------------------|---------------------|------------|--------------|
| Comments from {2}: {0}                        | Customer {0} said bla | OK                  | null       | company name |
| status updated to {1} by {2}. Description:{0} | Status updated to {1} |                     | Processing | company name |
| status updated to {1} by {2}. Description:{0} | Customer said {2}     |                     | Reconciled | company name |
| information only.                             | User {0} foo          | PAY9089723089-98391 | null       | null         |
| status updated to {1} by {2}. Description:{0} | {0} {1}               |                     | Processing | company name |
| status updated to {1} by {2}. Description:{0} | {1} {2}               |                     | Reconciled | company name |

Exception1和Exception 2列(出于可读性考虑,我删除了另外几个Exception列)包含占位符{},这些占位符将替换为ARGUMENT *列中的值。

我一直在寻找实现这一目标的方法,并且取得了相对成功,但我仍然缺乏将其做得更好的经验。

我写了一个简单的函数,可以通过gsub进行替换:

excp_ren2 <- function(x) {
  x %<>%
    gsub("\\{1\\}", x["ARGUMENT2"], .) %>%
    gsub("\\{0\\}", x["ARGUMENT1"], .) %>%
    gsub("\\{2\\}", x["ARGUMENT3"], .)
  x
}

然后一直使用apply及其差异。我已经完成了一个好的结果,例如:

new_df <-
  df %>% apply(
    .,
    MARGIN = 1,
    FUN = function(x)
      excp_ren2(x)
  ) %>% as.data.frame()

唯一的问题是这会转置矩阵,但这并不是真正的问题。

我正在寻找更好的方法来执行此操作,我以为可以通过mutate_ *来执行此操作,但是我认为我无法访问该函数内行的列名,或者至少我不知道怎么做。关于实现此目的的更简单方法的任何想法?

谢谢!

3 个答案:

答案 0 :(得分:2)

我们可以在管道中使用str_replace(已向量化),而不是按行执行此操作(并在每列而不是“ Exception1”上应用该函数)

library(stringr)
library(dplyr)
df %>%
  transmute(new =  str_replace_all(Exception1, "\\{1\\}", ARGUMENT2) %>% 
                   str_replace_all("\\{0\\}", ARGUMENT1) %>% 
                   str_replace_all("\\{2\\}", ARGUMENT3))
#                                                  new
#1                                  Comments from company name: OK
#2 status updated to Processing by company name. \\nDescription:\n
#3 status updated to Reconciled by company name. \\nDescription:\n
#4                  PCard order invoices are for information only.
#5 status updated to Processing by company name. \\nDescription:\n
#6 status updated to Reconciled by company name. \\nDescription:\n

如果我们有多列,则可以使用mutate_attransmute_at

df %>%
   transmute_at(vars(starts_with("Exception")), ~ 
           str_replace_all(., "\\{1\\}", ARGUMENT2) %>% 
                   str_replace_all("\\{0\\}", ARGUMENT1) %>% 
                   str_replace_all("\\{2\\}", ARGUMENT3))
#                    Exception1                   Exception2
#1                              Comments from company name: OK         Customer OK said bla
#2 status updated to Processing by company name. Description:  Status updated to Processing
#3 status updated to Reconciled by company name. Description:    Customer said company name
#4                                           information only. User PAY9089723089-98391 foo
#5 status updated to Processing by company name. Description:                    Processing
#6 status updated to Reconciled by company name. Description:       Reconciled company name

答案 1 :(得分:1)

也许是这样

clean_pipe <- . %>% 
  mutate(new_string = Exception1 %>% str_replace_all(pattern = "\\{0\\}",replacement = ARGUMENT1)) %>% 
  mutate(new_string = new_string %>% str_replace_all(pattern = "\\{1\\}",replacement = ARGUMENT2)) %>% 
  mutate(new_string = new_string %>% str_replace_all(pattern = "\\{2\\}",replacement = ARGUMENT3))

df %>% 
  clean_pipe

答案 2 :(得分:1)

您使用{ }进行划界的方式使我想到了使用glue的方式,其工作方式与此类似。要使胶粘模板与数据中的列名匹配,请首先使用stringr::str_replace_all中的命名列表,一步一步地将所有带替换的图案匹配。然后从glue列中创建"Exception*"对象。根据这篇文章(R dplyr: rowwise + mutate (+glue) - how to get/refer row content?),您将需要使用rowwise,因为否则它将尝试为 each <每个使用每个参数列的值。 / em>模板。我想将两个mutate_at步骤都放到一个函数中,但是在范围界定上遇到了一些麻烦,所以这是我可以工作的最简单的方法。

library(dplyr)
library(tidyr)

replacements <- c("\\{1\\}" = "{ARGUMENT2}",
                  "\\{0\\}" = "{ARGUMENT1}",
                  "\\{2\\}" = "{ARGUMENT3}")

as_tibble(df) %>%
  rowwise() %>%
  mutate_at(vars(starts_with("Exception")), stringr::str_replace_all, replacements) %>%
  mutate_at(vars(starts_with("Exception")), ~as.character(glue::glue(.)))
#> Source: local data frame [6 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 6 x 5
#>   Exception1                  Exception2       ARGUMENT1     ARGUMENT2 ARGUMENT3
#>   <chr>                       <chr>            <chr>         <chr>     <chr>    
#> 1 Comments from company name… Customer OK sai… OK            null      company …
#> 2 "status updated to Process… Status updated … " "           Processi… company …
#> 3 "status updated to Reconci… Customer said c… " "           Reconcil… company …
#> 4 information only.           User PAY9089723… PAY908972308… null      null     
#> 5 "status updated to Process… "  Processing"   " "           Processi… company …
#> 6 "status updated to Reconci… Reconciled comp… " "           Reconcil… company …

请注意,由于某些字符串为空,因此结果中会有多余的空格,可以用trimws进行修剪。