Question

我正在从SAS逐渐过渡到R，此刻我正试图复制我以前对宏所做的工作。

我有一个包含所有数据的表（我们将其称为IDF_pop），并从该表中创建另外两个表：YVE_pop和EPCI_pop，它们是主表的两个子集。我更喜欢创建单独的表，但是我想这可能不是最佳选择。这是我的操作方法：

## Let's say the main table contains 10 lines.
## codgeo is the city's postal code, epci is the area, and I have three
## variables that describe different parts of the population

codgeo <- c("75014","75020","78300","78520","78650","91200","91600","92500","93100","95230")
epci <- c("001","001","002","002","003","004","004","005","006","007")
pop0_15 <- c(10000*runif(10))
pop15_64 <- c(10000*runif(10))
pop65p <- c(10000*runif(10))

IDF_pop <- data.frame(codgeo,epci,pop0_15,pop15_64,pop65p)

## I'd like my population to be in one single column, for this I'll use melt

IDF_pop_line <- melt(IDF_pop,c("codgeo","epci"))

## Now I want to create separate tables for the Yvelines department (codgeo starts with 78) and for EPCI 002
## I could do it in two lines but I wanted to train using functions so here goes

localisation <- function(code_dep, lib_dep, code_epci, lib_epci){

  do.call("<<-",
          list(paste0(eval(lib_dep),"_pop_ligne"),
               IDF_pop_line %>% filter(stri_sub(codgeo,from=1,length=2)==code_dep)
          )
  )

  do.call("<<-",
          list(paste0(eval(lib_epci),"_pop_ligne"),
               IDF_pop_line %>% filter(epci==code_epci)
          )
  )

}

do.call("localisation",list("78","YVE","002","GPSO"))

有了这个，我有了3个表（IDF_，YVE_，GPSO_），现在可以解决主要问题了。

我接下来要做的是总结我的表。我正在尝试编写一个适用于所有3个表的函数。

我希望它完全依赖于该参数，但是看来do.call不会在其第二个参数中接受paste0。

## Aggregating the tables. I'll call the function 3 times, one for each level.

agregation <- function(lib){

  # This doesn't :

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               paste0(eval(lib),"_pop_line") %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

do.call("agregation",list("IDF")) # This one doesn't work

agregation2 <- function(lib){

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               IDF_pop_line %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

do.call("agregation2",list("IDF")) # This one does

如您所见，到目前为止，我发现的唯一可行的方法是编写用于聚合的表的全名。但这违背了具有可以自由参数化的东西的最初想法。如何修改函数的第一个版本，使其对所有三个可能的参数都起作用？

最后，我知道一个简单的解决方法是保留我的IDF_pop_line表并在最后一刻进行过滤以创建3个聚合表，但是我更喜欢从一开始就使用单独的表。

预先感谢您的帮助！

Answer 1

在agregation函数字符串paste0(eval(lib),"_pop_line")中返回数据框的名称，而不是数据框本身。尝试get

agregation <- function(lib){

  do.call("<<-",
          list(paste0(eval(lib),"_pop_agr"),
               get(paste0(eval(lib),"_pop_line")) %>%
                 group_by(variable) %>%
                 summarise(pop = sum(value))
          )
  )

}

Answer 2

以下是使用data.table的建议。

您可以在输入所有功能之前使用创建的IDF_pop。

library(data.table)

#make adata.table out of YVE_pop_ligne
setDT( IDF_pop )

#create groups to summarise by
IDF_pop[ epci == "002", GSPO := TRUE][]
IDF_pop[ grepl("^78", codgeo) , YVE := TRUE][]

#melt and filter only values where a filter is TRUE
dt <- data.table::melt( IDF_pop, 
                        id.vars = c("codgeo", "epci", "pop0_15", "pop15_64", "pop65p"),
                        measure.vars = c("GSPO", "YVE"))[ value == TRUE,][]

在结果之间（dt）

#    codgeo epci  pop0_15 pop15_64   pop65p variable value
# 1:  78300  002 6692.394 5441.225 4008.875     GSPO  TRUE
# 2:  78520  002 2128.604 6808.004 1889.822     GSPO  TRUE
# 3:  78300  002 6692.394 5441.225 4008.875      YVE  TRUE
# 4:  78520  002 2128.604 6808.004 1889.822      YVE  TRUE
# 5:  78650  003 8482.971 6556.482 5098.929      YVE  TRUE

代码

#now summarising is easy, sum by varianle-group on all pop-columns
dt[, lapply( .SD, sum), by = variable, .SDcols = names(dt)[grepl("^pop", names(dt) )] ]

最终输出

#    variable   pop0_15 pop15_64   pop65p
# 1:     GSPO  7171.683 5855.894 11866.55
# 2:      YVE 12602.153 8028.948 14364.21

如何在函数中正确使用do.call？

2 个答案: