Question

我正在尝试创建一个函数，该函数使用readxl::read_excel读取excel工作簿中的所有工作表并将其绑定到单个数据框中，并允许我将其他参数传递给{{1} } 。我可以将第一部分做得很好，但不能做第二部分。

read_excel

我应该找回一个文件，相反，我得到一个错误：

library(magrittr)

# example excel workbook with multiple sheets
path <- readxl::readxl_example("datasets.xlsx")

# function with simple forwarding
read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    rlang::set_names() %>%
    purrr::map_df(~ readxl::read_excel(path = path, sheet = .x, ...))

}

# errors with and without additional arguments
read_all(path)
read_all(path, skip = 5)

在不传递参数的情况下正常运行该功能：

Error: Can't guess format of this cell reference: iris
In addition: Warning message: Cell reference follows neither the A1 nor R1C1 format. Example: iris NAs generated.

在没有`# Function works without passing extra params read_all_0 <- function(path) { path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(~ readxl::read_excel(path = path, sheet = .x)) } read_all_0(path)`的简单函数中，参数传递可以正常工作

purrr::map_df

Answer 1

我遇到了类似的问题：在完全不同的上下文中（不使用函数且不使用省略号），通过map传递错误的参数，以供参考，请参见{{3 }}。

然后的解决方案是创建一个仅使用一个参数的命名函数，并将其传递给recode_df，以便唯一的参数是您要遍历的向量/列表。

适用于您的问题的解决方案如下所示：

map

似乎此问题源于对# function with forwarding read_all <- function(path, ...) { # function within function that sets the arguments path and ellipsis as given and only leaves sheet to be determined read_xl <- function(sheet) { readxl::read_excel(path = path, sheet, ...) } path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(read_xl) } # this allows you to pass along arguments in the ellipsis correctly read_all(path) read_all(path, col_names = FALSE)函数的不正确的环境处理。为了避免这种情况，我建议在注释中使用匿名函数。显然，下面的方法也可以。

purrr::as_mapper

要验证实际上是导致问题的read_all <- function(path, ...) { path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(function(x) { readxl::read_excel(path = path, sheet = x, ...) }) }函数，我们可以使用as_mapper从上面重写命名的函数中函数。在省略号中带有或不带有其他参数的情况下，都会再次产生错误。

as_mapper

更新知道# function with forwarding read_all <- function(path, ...) { # named mapper function read_xl <- purrr::as_mapper(~ readxl::read_excel(path = path, sheet = .x, ...)) path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(read_xl) }会导致问题，使我们可以更深入地研究问题。现在，我们可以在RStudio调试器中检查运行简单的as_mapper映射器版本时幕后情况：

read_excel

似乎在映射器函数中包含省略号时，read_xl <- purrr::as_mapper(~ readxl::read_excel(path = .x, sheet = .y, ...)) debugonce(read_xl) read_xl(path, 1)不仅将第一个参数映射到as_mapper，而且还自动将其映射到省略号.x。我们可以通过创建一个简单的映射器函数...并使用两个参数paster和.x来验证这一点。

...

现在的问题是：我们应该在映射器函数中使用省略号还是另一种错误？

Answer 2

我认为以下方法会起作用：

read_all <- function(path, ...) {

  path %>%
    readxl::excel_sheets() %>%
    purrr::set_names() %>%
    map_df(~readxl::read_excel(path=path, sheet=.x), ...)

}

因为map系列具有一个...参数，用于将其他参数传递给映射函数。但是，以下代码将忽略n_max参数，并仍然返回各种数据帧的所有行，而不是具有8行（每四张纸中的每两行）的数据帧：

p <- readxl_example("datasets.xlsx")
read_all(p, n_max=2)

但是，这可行：

read_all <- function(path, ...) {

  path %>% 
    excel_sheets() %>% 
    set_names() %>%
    map_df(read_excel, path=path, ...)

}

p <- readxl_example("datasets.xlsx")
read_all(path=p, n_max=2)

在上面，path和...中的任何其他参数都传递给read_excel和（显然）工作表名称（如果我们使用.x它显式地）隐式传递给sheet参数，我想是因为已经提供了第一个path参数。我真的不明白这一点，而且这似乎不是一种特别透明的方法，但是我认为我应该把它放在这里，以防其他人可以解释正在发生的事情并提供更好的代码。

使用purrr :: map_df

在不传递参数的情况下正常运行该功能：

在没有`# Function works without passing extra params read_all_0 <- function(path) { path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(~ readxl::read_excel(path = path, sheet = .x)) } read_all_0(path)`的简单函数中，参数传递可以正常工作

2 个答案:

使用purrr :: map_df

在不传递参数的情况下正常运行该功能：

在没有# Function works without passing extra params read_all_0 <- function(path) { path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(~ readxl::read_excel(path = path, sheet = .x)) } read_all_0(path) 的简单函数中，参数传递可以正常工作

2 个答案:

在没有`# Function works without passing extra params read_all_0 <- function(path) { path %>% readxl::excel_sheets() %>% rlang::set_names() %>% purrr::map_df(~ readxl::read_excel(path = path, sheet = .x)) } read_all_0(path)`的简单函数中，参数传递可以正常工作