我已经构建了一个函数,我希望从数据框传递数据框和列。例如:
testdf <- structure(list(date = c("2016-04-04", "2016-04-04", "2016-04-04",
"2016-04-04", "2016-04-04", "2016-04-04"), sensorheight = c(1L,
16L, 1L, 16L, 1L, 16L), farm = c("McDonald", "McDonald", "McDonald",
"McDonald", "McDonald", "McDonald"), location = c("4", "4", "5",
"5", "Outside", "Outside"), Temp = c(122.8875, 117.225, 102.0375,
98.3625, 88.5125, 94.7)), .Names = c("date", "sensorheight",
"farm", "location", "Temp"), row.names = c(NA, 6L), class = "data.frame")
> testdf
date sensorheight farm location Temp
1 2016-04-04 1 McDonald 4 122.8875
2 2016-04-04 16 McDonald 4 117.2250
3 2016-04-04 1 McDonald 5 102.0375
4 2016-04-04 16 McDonald 5 98.3625
5 2016-04-04 1 McDonald Outside 88.5125
6 2016-04-04 16 McDonald Outside 94.7000
该函数根据不同列中的值从其他值中减去一些值。它正在工作,接受数据框和列输入,但自更新R以来,它无效。
DailyInOutDiff <- function (df, variable) {
DailyInOutDiff04 <- df %>%
filter(location %in% c(4, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'],
location = "4") %>%
select(1, 2, 3, 5, 4)
DailyInOutDiff05 <- df %>%
filter(location %in% c(5, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'],
location = "5") %>%
select(1, 2, 3, 5, 4)
temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
final.df = bind_rows(temp.list)
return(final.df)
}
test <- DailyInOutDiff(testdf, "Temp")
test <- DailyInOutDiff(testdf, quote(Temp))
他们会产生以下错误消息:
Error in summarise_impl(.data, dots) :
Evaluation error: non-numeric argument to binary operator.
和
Error in summarise_impl(.data, dots) :
Evaluation error: object of type 'symbol' is not subsettable.
我想知道这些错误消息的含义以及如何解决它们。
我尝试了这些解决方案Pass a data.frame column name to a function,但是这些解决方案都不适用于我。
如果我将列作为输入删除,则不会发生错误,但我需要该列,因为我将该函数应用于大型数据框中的多个列。
我想要的输出:
date sensorheight farm location Temp
1 2016-04-04 1 McDonald 4 34.3750
2 2016-04-04 16 McDonald 4 22.5250
3 2016-04-04 1 McDonald 5 13.5250
4 2016-04-04 16 McDonald 5 3.6625
答案 0 :(得分:2)
我无法复制第二个错误,但我可以复制第一个错误。 summarise
函数似乎无法调用Temp
,因为它认为它是character
对象。换句话说,您正在调用列名,而不是列。如果您逐行在函数内运行代码,而不是使用variable
df$variable
,则会看到它有效。
话虽如此,解决方案非常简单。我刚刚在你的函数中添加了行variable<- as.name(variable)
。现在它写着:
DailyInOutDiff <- function (df, variable) {
variable<- as.name(variable)
DailyInOutDiff04 <- df %>%
filter(location %in% c(4, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'],
location = "4") %>%
select(1, 2, 3, 5, 4)
DailyInOutDiff05 <- df %>%
filter(location %in% c(5, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'],
location = "5") %>%
select(1, 2, 3, 5, 4)
temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
final.df = bind_rows(temp.list)
return(final.df)
}
输出是:
> test <- DailyInOutDiff(testdf, "Temp")
> test
Source: local data frame [4 x 5]
Groups: date, sensorheight [2]
date sensorheight farm location Diff
<chr> <int> <chr> <chr> <dbl>
1 2016-04-04 1 McDonald 4 34.3750
2 2016-04-04 16 McDonald 4 22.5250
3 2016-04-04 1 McDonald 5 13.5250
4 2016-04-04 16 McDonald 5 3.6625
答案 1 :(得分:1)
如果您使用的是最新的dplyr
(0.7),则可以使用.data
通过字符串引用列名称,您的函数将被修改为:
DailyInOutDiff <- function (df, variable) {
DailyInOutDiff04 <- df %>%
filter(location %in% c(4, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else .data[[variable]][location=="4"] - .data[[variable]][location=='Outside'],
location = "4") %>%
select(1, 2, 3, 5, 4)
DailyInOutDiff05 <- df %>%
filter(location %in% c(5, 'Outside')) %>%
group_by(date, sensorheight, farm) %>%
arrange(sensorheight, farm, location) %>%
summarise(Diff = if(n()==1) NA else .data[[variable]][location=="5"] - .data[[variable]][location=='Outside'],
location = "5") %>%
select(1, 2, 3, 5, 4)
temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
final.df = bind_rows(temp.list)
return(final.df)
}
从variable[...]
到.data[[variable]][...]
的更改意味着它现在选择variable
中字符串指定的列,而不是尝试索引实际字符串。使用提供的数据运行此函数将返回:
DailyInOutDiff(testdf, "Temp")
#> # A tibble: 4 x 5
#> # Groups: date, sensorheight [2]
#> date sensorheight farm location Diff
#> <chr> <int> <chr> <chr> <dbl>
#> 1 2016-04-04 1 McDonald 4 34.3750
#> 2 2016-04-04 16 McDonald 4 22.5250
#> 3 2016-04-04 1 McDonald 5 13.5250
#> 4 2016-04-04 16 McDonald 5 3.6625
答案 2 :(得分:0)
以下调用函数DailyInOutDiff并将 testdf 分配给 df ,将&#34; Temp&#34; 分配给变量< / strong>即可。
test <- DailyInOutDiff(testdf, "Temp")
test <- DailyInOutDiff(testdf, quote(Temp))
根据您要执行的操作,您希望从数据框传递数据框和列。目前,您只传递列名称,这是一个字符串,而不是列。您必须将其更改为
test <- DailyInOutDiff(testdf, testdf["Temp"])
其次,您正在传递 Temp 列,并尝试根据以下代码段中的位置过滤变量数据框。
总结(Diff = if(n()== 1)NA else变量[location ==&#34; 4&#34;] - 变量[location ==&#39; Outside&#39;], location =&#34; 4&#34;)
一定是,
variable[variable$location=="4",]
如果您的电话是,
test <- DailyInOutDiff(testdf, testdf["Temp"])
或
variable[variable$Temp=="4",]
如果你打电话是,
test <- DailyInOutDiff(testdf, testdf["Temp"])