我正在尝试构建一个辅助函数,以提取自变量给定列中的数字。我可以在mutate
内使用我的函数(并在所有感兴趣的列中重复进行此操作),但是它似乎在mutate_at
内不起作用。
以下是我的数据的示例:
> set.seed(20190928)
> evalYr <- 2018
> n <- 5
> (df <- data.frame(
+ AY = sample(2016:2019, n, replace = T),
+ Pay00 = rgamma(n, 2, 1/1000),
+ Pay01 = rgamma(n, 2, 1/1000),
+ Pay02 = rgamma(n, 2, 1/1000),
+ Pay03 = rgamma(n, 2, 1/1000)
+ ))
AY Pay00 Pay01 Pay02 Pay03
1 2018 2520.3772 2338.9490 919.8245 629.1657
2 2016 259.7804 1543.4450 661.6488 2382.7916
3 2018 2446.3075 312.5143 2297.9717 942.5627
4 2017 1386.6288 4179.0352 2370.2669 1846.5838
5 2018 541.8261 2104.4589 2622.1758 2606.0694
因此,我已经构建(使用dplyr
语法来构造此帮助程序以在我拥有的每个PayXX
列上进行突变:
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
此功能可以与dplyr::mutate
一起正常使用:
> df %>% mutate(Pay00_numcol = f1(Pay00),
+ Pay01_numcol = f1(Pay01),
+ Pay02_numcol = f1(Pay02),
+ Pay03_numcol = f1(Pay03))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1 2 3
2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1 2 3
3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1 2 3
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1 2 3
5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1 2 3
但是当我尝试在mutate_at
中使用相同的函数时,它将返回NA:
> df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 NA NA NA NA
2 2016 259.7804 1543.4450 661.6488 2382.7916 NA NA NA NA
3 2018 2446.3075 312.5143 2297.9717 942.5627 NA NA NA NA
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 NA NA NA NA
5 2018 541.8261 2104.4589 2622.1758 2606.0694 NA NA NA NA
有人遇到过类似的问题吗?在这种情况下,我该如何处理mutate_at
函数?
谢谢
library(tidyverse)
library(stringr)
set.seed(20190928)
evalYr <- 2018
n <- 5
(df <- data.frame(
AY = sample(2016:2019, n, replace = T),
Pay00 = rgamma(n, 2, 1/1000),
Pay01 = rgamma(n, 2, 1/1000),
Pay02 = rgamma(n, 2, 1/1000),
Pay03 = rgamma(n, 2, 1/1000)
))
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
# Working
df %>% mutate(Pay00_numcol = f1(Pay00),
Pay01_numcol = f1(Pay01),
Pay02_numcol = f1(Pay02),
Pay03_numcol = f1(Pay03))
# Not working
df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
答案 0 :(得分:0)
我想到的第一种方法是,通过重塑数据可能会更容易。但是,仍然需要tidyr
函数的纠缠才能获得1)一列“ Pay00”,“ Pay01”等; 2)提取数字; 3)进行操作,以便您可以使用tidyr::spread
回到宽边形状;和4)传播并删除我添加的“ _value”位。
我相信使用最新版本的tidyr
可以有一种更好的方法,因为新的pivot_wider
函数应该可以将value
包含多个列。我一点都没有弄乱,但也许其他人可以写下来。
library(tidyverse)
df %>%
rowid_to_column() %>%
gather(key, value, -AY, -rowid) %>%
mutate(numcol = as.numeric(str_extract(key, "\\d+$"))) %>%
gather(key = coltype, value, value, numcol) %>%
unite(key, key, coltype) %>%
spread(key, value) %>%
select(AY, ends_with("value"), ends_with("numcol")) %>%
rename_all(str_remove, "_value")
#> AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol
#> 1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1
#> 2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1
#> 3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1
#> 4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1
#> 5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1
#> Pay02_numcol Pay03_numcol
#> 1 2 3
#> 2 2 3
#> 3 2 3
#> 4 2 3
#> 5 2 3
或者,如果您想使用tidyeval方法:调用时获取列的名称。请注意,如果您使用list(numcol = ~f1(.))
表示法,所有这些保证都将以.
f1 <- function(pmt) {
str_extract(rlang::as_name(enquo(pmt)), "\\d+$") %>%
as.numeric()
}
df %>%
mutate_at(vars(starts_with("Pay")), list(numcol = f1))
# same output as prev