将data.frame / list作为函数的参数

时间:2018-06-11 15:23:47

标签: r

简而言之,我有一个更大的函数来创建data.frames,它们是更大的data.frame的子集,并以函数的参数命名。它正在构建原始数据的数据框架以及Holt-Winters的输出和预测输出...这意味着它正在创建多个data.frames。下面是一个小例子(虽然这里没有足够的间隔来实际生成ts类data.frame):

Group <- c("Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group","Tertiary_Group")
Day <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
Type <- c("A","A","A","B","B","B","A","A","A","B","B","B","A","A","A","B","B","B")
Value <- c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
df <- as.data.frame(cbind(Group,Day,Type,Value))

Fun <- function(Group,Type, A, B, G){
    df <- Data[Data$Group== Group & Data$Type== Type, ]
    assign(paste(Group,Type,"_df",sep = ''), df, envir = parent.frame()) 
    df_holtwinters <- HoltWinters(ts(Data[Data$Group== Group & Data$Type== Type, ], 
                                  frequency = 365), alpha = A, beta = B, gamma = G)
    assign(paste(Group,Type,"_hw",sep = ''), df_holtwinters, envir = parent.frame()) 
}

您会注意到类型是字符,而 A,B,G 是数字或{{1} }。

如果我现在有一个由列表值组成的data.frame,我怎样才能最好地循环上面的函数(可能是NULL)来使用第一行中每列的值...然后每列来自第2行等 - 创建多个数据帧。

mapply

理想情况下,我会获得以下data.frames来生成...

argGroup <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group","Tertiary_Group","Tertiary_Group")
argType <- c("A","B","A","B","A","B")
argA <- c(NA, NA, NA, NA, NA, NA)
argB <- c(0.05, 0.05, NA, NA, NA, NULL)
argG <- c(NA, NA, NA, NA, NA, NA)

argGroup[is.na(argGroup)] <- list(NULL)
argType[is.na(argType)] <- list(NULL)
argA[is.na(argA)] <- list(NULL)
argB[is.na(argB)] <- list(NULL)
argG[is.na(argG)] <- list(NULL)

Arguments <- cbind(argType, argType, argA, argB, argG)

了解如何最好(最自动化)Primary_Group_A_df Primary_Group_A_hw Primary_Group_B_df Primary_Group_B_hw Secondary_Group_A_df Secondary_Group_A_hw Secondary_Group_B_df Secondary_Group_B_hw Tertiary_Group_A_df Tertiary_Group_A_hw Tertiary_Group_B_df Tertiary_Group_B_hw 所有 _df 和所有 _hw 一起使用也很有帮助。

任何帮助都会令人惊叹并且非常感激。非常感谢!

2 个答案:

答案 0 :(得分:0)

您使用as.data.frame(cbind(...))丢失了类型信息, 只需直接使用data.frame

Data <- data.frame(
  Group = rep(c("Primary_Group", "Secondary_Group", "Tertiary_Group"), each = 6L),
  Day = rep(1L:3L, 6L),
  Type = rep(rep(c("A", "B"), each = 3L), 3L),
  Value = c(7,3,10,3,9,4,0,9,3,10,1,6,3,4,10,2,3,1)
)

之后,我认为你可以做到以下几点:

split_data <- split(Data, as.list(Data[, c("Group", "Type")]))
dfs <- do.call(rbind, split_data)

dfs_hw <- lapply(split_data, function(sub_data) {
  Map(argA, argB, argG, f = function(A, B, G) {
    HoltWinters(ts(sub_data, frequency = 365), alpha = A, beta = B, gamma = G)
  })
})

dfs_hw <- do.call(rbind, unlist(dfs_hw, recursive = FALSE))

但我从HoltWinters收到错误, 所以我不能肯定地说。 另外,我认为dfs只是再次Data,只是重新排序。

答案 1 :(得分:0)

避免使用许多类似结构的对象充斥您的全局环境。考虑使用诸如列表之类的容器来容纳许多数据帧。一种有用的方法是by通过一个或多个因子(例如 Group Type )对数据帧进行子集化,以返回数据帧列表。此外,不要按行迭代,而是NULL个参数与每个子集的一次参数传递的数据。

具体而言,为 df hw 列表调用"NULL"两次。但首先,通过 Group Type 合并 df Arguments 数据框。一个挑战是HW无法存储在数据框中,因此请考虑保存as.numeric字符串并指定临时变量以传递到Group <- c("Primary_Group","Primary_Group","Secondary_Group","Secondary_Group", "Tertiary_Group","Tertiary_Group") Type <- c("A","B","A","B","A","B") argA <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL") argB <- c(0.05, 0.05, "NULL", "NULL", "NULL", "NULL") argG <- c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL") Arguments <- data.frame(Group, Type, argA, argB, argG, stringsAsFactors=FALSE) df <- merge(df, Arguments, by=c("Group", "Type")) 参数。不幸的是,这会将整个列转换为字符类型,您需要使用# ORDER FOR NAMING LATER df <- with(df, df[order(Type, Group),]) # DATAFRAME LIST df_list <- by(df, df[c("Group", "Type")], identity) # RENAME LIST df_list <- setNames(df_list, unique(paste0(df$Group, "_", df$Type, "_df"))) # REFERENCE ELEMENTS df_list$Primary_Group_A_df df_list$Secondary_Group_A_df df_list$Tertiary_Group_A_df ... 转换为非NULL值。

<强>合并

# HW LIST
hw_list <- by(df, df[c("Group", "Type")], function(sub) {
  # CONDITIONALLY ASSIGN TEMP VARIABLES 
  # (BEING SUBSETS: max(arg*)==min(arg*)==mean(arg*)==median(arg*))
  if(!is.na(max(sub$argA)) & max(sub$argA) == "NULL") { tmpA <- NULL } 
  else { tmpA <- max(as.numeric(sub$argA)) }

  if(!is.na(max(sub$argB)) & max(sub$argB) == "NULL") { tmpB <- NULL } 
  else { tmpB <- max(as.numeric(sub$argB)) }

  if(!is.na(max(sub$argG)) & max(sub$argG) == "NULL") { tmpG <- NULL } 
  else { tmpG <- max(as.numeric(sub$argG)) }

  # PASS ARGS ONCE PER SUBSET 
  return(HoltWinters(ts(sub, frequency = 365), alpha=tmpA, beta=tmpB, gamma=tmpG))
})

# RENAME LIST
hw_list <- setNames(hw_list, unique(paste0(df$Group, "_", df$Type, "_hw")))

# REFERENCE ELEMENTS
hw_list$Primary_Group_A_hw
hw_list$Secondary_Group_A_hw
hw_list$Tertiary_Group_A_hw
...

数据框列表 (带有命名的df元素)

> hw_list$Primary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.2169231
 beta : 0.05
 gamma: 0.1

Coefficients:
          [,1]
a   2.89129621
b   0.08783715
s1  0.54815382
s2 -0.12485260
s3  0.21087038

> hw_list$Secondary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.752124
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.691664e+00
b   3.333333e-01
s1  3.333333e-01
s2 -1.480388e-16
s3 -3.333333e-01

> hw_list$Tertiary_Group_A_hw
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = ts(sub[c("Group", "Day", "Type", "Value")], frequency = 3),     alpha = tmpA, beta = tmpB, gamma = tmpG)

Smoothing parameters:
 alpha: 0.3145406
 beta : 0
 gamma: 0

Coefficients:
            [,1]
a   3.022946e+00
b  -3.333333e-01
s1 -3.333333e-01
s2 -1.480388e-16
s3  3.333333e-01

HW列表 (带有命名的hw元素)

{{1}}

输出 (使用3表示硬件的频率与发布的数据保持一致)

{{1}}