我正在尝试编写一个自定义函数来传递给dplyr中的do()。最终目标是除了group_by()之外还使用它来使我的自定义函数在不同的数据块上运行。
这是我的数据集的样子
> head(data,4)
subject ps polarity rs log_rs
1 Danesh 1.0 regular 216.0000 5.375278
2 Danesh 0.9 regular 285.7143 5.654992
3 Danesh 0.8 regular 186.3354 5.227548
4 Danesh 0.7 regular 218.1818 5.385329
生成此数据集的代码:
data <- structure(list(subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ChristinaP",
"Danesh", "Elizabeth", "Ina", "JaclynT", "JessicaS", "Rhea",
"Samuel", "Tyler", "Vinodh"), class = "factor"), ps = c(1, 0.9,
0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 1, 0.9, 0.8, 0.7, 0.6,
0.5, 0.4, 0.3, 0.2, 0.1), polarity = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("regular", "reverse"), class = "factor"), rs = c(216,
285.714285714286, 186.335403726708, 218.181818181818, 183.673469387755,
194.174757281553, 202.020202020202, 184.615384615385, 153.452685421995,
191.693290734824, 216, 285.714285714286, 186.335403726708, 218.181818181818,
183.673469387755, 194.174757281553, 202.020202020202, 184.615384615385,
153.452685421995, 191.693290734824), log_rs = c(5.37527840768417,
5.65499231048677, 5.22754829565983, 5.38532874353767, 5.21315955820773,
5.26875856430649, 5.30836770240154, 5.2182746588745, 5.03339228121887,
5.25589665066408, 5.37527840768417, 5.65499231048677, 5.22754829565983,
5.38532874353767, 5.21315955820773, 5.26875856430649, 5.30836770240154,
5.2182746588745, 5.03339228121887, 5.25589665066408)), class = "data.frame",
row.names = c(NA, -20L), .Names = c("subject", "ps", "polarity", "rs", "log_rs"))
最后的电话看起来像是:
temp_df <- data %>%
group_by (subject, polarity) %>%
do (customFun(.$ps, .$rs))
我的自定义函数做了很多事情(为了简单起见我在这里跳过),其中计算基于变量ps的值选择的行子集上的max(rs)。换句话说,我只保留ps低于第2行的ps或大于第5行的ps的行,并计算这些选定行的最大值,如下面的虚拟示例所示:
customFun <- function(df, ps, rs) {
omax = df %>%
filter (ps < ps[2] | ps > ps[5]) %>%
summarise (max(rs))
}
问题是我想在group_by()子数据帧中传递这个函数,所以我不能给我函数中调用的数据帧赋一个特定的名字。相反,我希望该函数知道它应该在当前的数据块上自动工作。我尝试过这样的事情:
omax = . %>%
filter (ps < ps[2] | ps > ps[5]) %>%
summarise (max(rs))
还有很多其他的变化,但似乎没有任何效果......我在网上发现了一些类似的问题,比如here,但仍然无法弄清楚。有关如何解决此问题的任何帮助/提示? 谢谢!
答案 0 :(得分:0)
我找到了问题here
的答案自定义功能:
customFun <- function(df, ps, rs) {
omax = df %>%
filter (ps < ps[2] | ps > ps[5]) %>%
summarise (max(rs))
}
最后的电话:
temp_df <- data %>%
group_by (subject, polarity) %>%
do (customFun(., .$ps, .$rs))