rowMeans在dplyr中起作用

时间:2015-03-16 17:51:41

标签: r dplyr

我一直在尝试在rowMeans的{​​{1}}函数中运行计算dplyr,但不断出错。下面是一个示例数据集和所需的结果。

mutate

我编写的代码首先是随机抽样DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), DATE = c("1","1","2","2","3","3","3","4","4"), STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000), STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000)) RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), DATE = c("1","1","2","2","3","3","3","4","4"), STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000), STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000), NAYSA = c(1.5, 3, 45, 60, 150, 300, 450, 7500, 9000)) STUFF。然后,我想计算STUFF2rowMeans的{​​{1}},并将结果导出到新列。我可以使用STUFF完成此任务,但必须重做更多的变量。此外,我可以使用R base包,但更喜欢使用STUFF2中的tidyr函数找到解决方案。提前谢谢。

mutate

4 个答案:

答案 0 :(得分:9)

@GregF是的.... ungroup()是关键。谢谢。

工作代码

RESULT = group_by(DATA, SITE, DATE) %>% 
  mutate(STUFF = sample(STUFF,replace= TRUE), 
         STUFF2 = sample(STUFF2,replace= TRUE)) %>% 
  ungroup() %>% 
  mutate(NAYSA = rowMeans(.[,-1:-2]))

答案 1 :(得分:7)

您需要rowwise中的dplyr功能来执行此操作。您的数据是随机的(因为样本)因此会产生不同的结果,但您会看到它的工作原理:

library(dplyr)
  group_by(DATA, SITE, DATE) %>%
  mutate(STUFF=sample(STUFF,replace= TRUE), STUFF2 = sample(STUFF2,replace= TRUE))%>%
  rowwise() %>%
  mutate(NAYSA = mean(c(STUFF,STUFF2)))

输出:

Source: local data frame [9 x 5]
Groups: <by row>

  SITE DATE STUFF STUFF2  NAYSA
1    A    1     1      2    1.5
2    A    1     2      2    2.0
3    A    2    30     80   55.0
4    A    2    30     60   45.0
5    B    3   200    600  400.0
6    B    3   300    200  250.0
7    B    3   100    600  350.0
8    C    4  5000  12000 8500.0
9    C    4  6000  10000 8000.0

如您所见,根据STUFF和STUFF2计算每行的行方式平均值

答案 2 :(得分:0)

rowMeans函数至少需要两个维度 但是DATA[,-1:-3]只是一行。

[1]     2     4    60    80   200   400   600 10000 12000

您可以通过以下代码获得结果

DATA%>%
        group_by(SITE, DATE) %>% 
        ungroup() %>% 
        mutate(NAYSA = rowMeans(.[,3:4]))

  SITE DATE STUFF STUFF2  NAYSA
1    A    1     1      2    1.5
2    A    1     2      4    3.0
3    A    2    30     60   45.0
4    A    2    40     80   60.0
5    B    3   100    200  150.0
6    B    3   200    400  300.0
7    B    3   300    600  450.0
8    C    4  5000  10000 7500.0
9    C    4  6000  12000 9000.0

答案 3 :(得分:0)

另一种(最佳方法)是使用map2_dbl

library(purrr)
library(dplyr)
DATA %>% 
  mutate(NAYSA = map2_dbl(STUFF, STUFF2, ~mean(c(.x, .y))))

输出:

  SITE DATE STUFF STUFF2  NAYSA
1    A    1     1      2    1.5
2    A    1     2      4    3.0
3    A    2    30     60   45.0
4    A    2    40     80   60.0
5    B    3   100    200  150.0
6    B    3   200    400  300.0
7    B    3   300    600  450.0
8    C    4  5000  10000 7500.0
9    C    4  6000  12000 9000.0