我正在尝试实现以下目标:我有一个数据集,以及一个对该数据集进行子集处理的函数,然后对该子集执行一系列操作。子设置根据行名发生。我可以逐步进行操作(即分别为每个子集运行此功能),但是我有所需子集的列表,我想遍历此列表。听起来很复杂-请检查以下示例。 这就是我可以做的:
#dataframe with rownames
whole_dataset <- data.frame(wt1 = c(1, 2, 3, 6, 6),
wt2 = c(2, 3, 4, 4, 2))
row.names(whole_dataset) = c("HTA1", "HTA2", "HTB2", "CSE1", "CSE2")
# two different non-overlapping subsets
his <- c("HTA1", "HTA2", "HTB2")
cse <- c("CSE1", "CSE2")
#this is the function I have
fav_complex <- function (data, complex) {
small_data<- data[complex,] #subset only the rows that you need
sum.all<-colSums(small_data) #calculate sum of columns
return(sum.all)
}
#I generate two deparate named vectors
his_data <- fav_complex(data = whole_dataset, complex = his)
cse_data <- fav_complex(data = whole_dataset, complex = cse)
#and merge them
merged_data<- rbind(his_data,cse_data)
看起来像这样
> merged_data
wt1 wt2
his_data 6 9
cse_data 12 6
我想以某种方式生成merged_data数据帧,而不必多次调用'fav_complex'函数。在现实生活中,我大约有20个子集,并且其中包含很多代码。这是我无法解决的解决方案
#I first have a character vector listing all the variable names
subset_list <- c("his", "cse")
#then create a loop that goes over this list
#make an empty dataframe
merged_data2 <- data.frame()
#fill it with a for loop output
for (element in subset_list) {
result <- fav_complex(data = whole_dataset, element)
merged_data2 <-rbind(merged_data2, result)
}
我知道这是错误的。在此循环中,“ element”只是一个字符串,而不是其中包含填充的变量。但是我不知道如何使它成为变量。 noquote(element)无效。我尝试阅读有关非标准评估和eval(),alternate()的文章,但对我来说太抽象了-我认为凭我的R专业知识我还不在那里。
答案 0 :(得分:2)
请考虑by
以在所有子集中运行所需的操作。但首先创建一个 group 列:
# ANY FUNCTION TO APPLY ON SUBSETS (REMOVE GROUP COL)
fav_complex_new <- function (sub) {
sum.all <- colSums(transform(sub, group=NULL))
return(sum.all)
}
# ASSIGN GROUPING
whole_dataset$group <- ifelse(row.names(whole_dataset) %in% his, "his",
ifelse(row.names(whole_dataset) %in% cse, "cse", NA))
# BY CALL
df_list <- by(whole_dataset, whole_dataset$group, FUN=fav_complex_new)
# COMBINE ALL DFs IN LIST
merged_data <- do.call(rbind, df_list)
Rextester demo (包括OP的原始解决方案及以上解决方案)
答案 1 :(得分:1)
按照@Gregor建议的修改工作流程,您是否会考虑这种解决方案,包括一些额外的数据争执?
dplyr
创建按复杂度分组的数据的拆分应用组合摘要。它可以像这样
library(dplyr)
whole_dataset <- tibble(wt1 = c(1, 2, 3, 6, 6),
wt2 = c(2, 3, 4, 4, 2),
id = factor(c("HTA1", "HTA2", "HTB2", "CSE1", "CSE2")))
whole_dataset <- mutate(whole_dataset,
complex = case_when(
grepl("^HT", id) ~ "his",
grepl("^CSE", id) ~ "cse")
) %>%
group_by(factor(complex))
whole_dataset %>% summarize(sum_wt1 = sum(wt1),
sum_wt2 = sum(wt2))
# # A tibble: 2 x 3
# `factor(complex)` sum_wt1 sum_wt2
# <fct> <dbl> <dbl>
# 1 cse 12 6
# 2 his 6 9