将变量传递给tidyr的聚集以重命名键/值列?

时间:2016-06-10 19:42:56

标签: r tidyr

我想在自定义函数中调用tidyr::gather(),我将传递一对将用于重命名keyvalue列的字符变量。例如

myFunc <- function(mydata, key.col, val.col) {
    new.data <- tidyr::gather(data = mydata, key = key.col, value = val.col)
    return(new.data)    
}

然而,这并不是所希望的。

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45), day.3 = c(17, 9, 33))

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively
long.data <- myFunc(mydata = temp.data, key.col = "day", val.col = "temp")

# Columns have *not* been renamed as desired
head(long.data)
  key.col val.col
1   day.1      20
2   day.1      22
3   day.1      23
4   day.2      32
5   day.2      22
6   day.2      45

期望的输出:

head(long.data)
    day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45

我的理解是gather()对大多数参数使用裸变量名称(就像在本例中一样,使用"key.col"作为列名而不是存储的key.col)。我尝试了多种方法在gather()调用中传递值,但大多数返回错误。例如,gather()myFunc调用的这三个变体返回Error: Invalid column specification(为了便于说明,忽略具有相同行为的value参数):

gather(data = mydata, key = as.character(key.col) value = val.col)

gather(data = mydata, key = as.name(key.col) value = val.col)

gather(data = mydata, key = as.name(as.character(key.col)) value = val.col)

作为解决方法,我只需在调用gather()后重命名列:

colnames(long.data)[colnames(long.data) == "key"] <- "day"

但鉴于gather()声称重命名键/值列的功能,我如何在自定义函数的gather()调用中执行此操作?

4 个答案:

答案 0 :(得分:2)

要将它放在一个函数中,你必须使用gather_()

myFunc <- function(mydata, key.col, val.col, gather.cols) {
  new.data <- gather_(data = mydata,
                      key_col = key.col,
                      value_col = val.col,
                      gather_cols = colnames(mydata)[gather.cols])
  return(new.data)    
}

temp.data <- data.frame(day.1 = c(20, 22, 23), day.2 = c(32, 22, 45),
day.3 = c(17, 9, 33))
temp.data


     day.1 day.2 day.3
1    20    32    17
2    22    22     9
3    23    45    33

# Call my custom function, renaming the key and value columns 
# "day" and "temp", respectively

long.data <- myFunc(mydata = temp.data, key.col = "day", val.col =   
"temp", gather.cols = 1:3)
# Columns *have* been renamed as desired
head(long.data)

  day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45

如上所述,主要区别在于gather_您必须使用gather_cols参数指定要收集的列。

答案 1 :(得分:1)

大多数(如果不是全部)Haldey的函数使用裸变量名作为参数(例如dplyr的函数)具有function_版本,该版本使用常规评估并且“适合编程” 。所以,你需要的只是:

myFunc <- function(mydata, key.col, val.col) {
  tidyr::gather_(data = mydata, key_col = key.col,
                 value_col = val.col, gather_cols = colnames(mydata))         
}

此处唯一的“问题”是必须指定gather_cols,这在使用gather时不是必需的,或者可以单独作为...完成。

然后:

> myFunc2(mydata = temp.data, key.col = "day", val.col = "temp")
    day temp
1 day.1   20
2 day.1   22
3 day.1   23
4 day.2   32
5 day.2   22
6 day.2   45
7 day.3   17
8 day.3    9
9 day.3   33

答案 2 :(得分:0)

请注意,现在不建议使用下划线版本(至少从tidyr版本0.8.2开始)。参见例如?gather_

答案 3 :(得分:0)

...并且有相同的问题,我现在在这里找到答案:https://dplyr.tidyverse.org/articles/programming.html

您可以让dplyr通过使用感叹号将其设置来评估符号。在您最初的问题中,代码为:

<ScrollView contentContainerStyle={{ marginTop: 10 }} />