我希望有人能够使用dplyr在管道中使用某种形式的expand.grid
。我正在进行一些建模,其中我有几个不同的组(或下面的类型),并且组具有x和amp的不同范围。数据。一旦我对数据进行了操作,我就有兴趣为预测创建一个图,但我只想预测每个值占用的范围内的值,而不是数据集的整个范围。
我已经在下面发布了一个工作示例,但我想知道是否有办法绕过循环并完成我的任务。
干杯
require(ggplot2)
require(dplyr)
# Create some data
df = data.frame(Type = rep(c("A","B"), each = 100),
x = c(rnorm(100, 0, 1), rnorm(100, 2, 1)),
y = c(rnorm(100, 0, 1), rnorm(100, 2, 1)))
# and if you want to check out the data
ggplot(df,aes(x,y,col=Type)) + geom_point() + stat_ellipse()
# OK so I have no issue extracting the minimum and maximum values
# for each type
df_summ = df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y))
df_summ
# and I can create a loop and use the expand.grid function to get my
# desired output
test = NULL
for(ii in c("A","B")){
df1 = df_summ[df_summ$Type == ii,]
x = seq(df1$xmin, df1$xmax, length.out = 10)
y = seq(df1$ymin, df1$ymax, length.out = 10)
coords = expand.grid(x = x, y = y)
coords$Type = ii
test = rbind(test, coords)
}
ggplot(test, aes(x,y,col = Type)) + geom_point()
但我真正想做的是找到绕过循环的方法 并尝试直接从我的管道操作员获得相同的输出。 我使用do()函数尝试了一些组合,但没有效果, 而下面发布的只是众多失败尝试之一
df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y)) %>%
do(data.frame(x = seq(xmin, xmax, length.out = 10),
y = seq(ymin, ymax, length.out = 10)))
# this last line returns an error
# Error in is.finite(from) :
# default method not implemented for type 'closure'
答案 0 :(得分:2)
您的do()
尝试几乎是正确的。诀窍就是在总结之后重新分组(这似乎放弃了分组)。您还需要确保使用.$
从链中的数据中获取值。试试这个
test <- df %>%
group_by(Type) %>%
summarize(xmin = min(x),
xmax = max(x),
ymin = min(y),
ymax = max(y)) %>%
group_by(Type) %>%
do(expand.grid(x = seq(.$xmin, .$xmax, length.out = 10),
y = seq(.$ymin, .$ymax, length.out = 10)))
ggplot(test, aes(x,y,col = Type)) + geom_point()
答案 1 :(得分:0)