如何在R中减去分组后果数据

时间:2017-04-04 19:23:17

标签: r dataframe dplyr data-manipulation

我是学生计算机科学专业的学生和新手R用户。

以下是我的Dataframe。

set.seed(1234)
df <- data.frame(
                  sex = rep(c('M','F'), 10),
                  profession = rep(c('Doctor','Lawyer'), each = 5),
                  pariticpant = rep(1:10, 2),
                  x = runif(20, 1, 10),
                  y = runif(20, 1, 10))

enter image description here

我想找到每天和每个参与者的x和y差异。这将创建一个10行数据帧。

dday将替换day,因为值将是天数之间的差异。

dday sex profession participant dx   dy
0-1  M   Doctor     1           5.22 1.26
.
.
.

R中是否有适当的方法来执行此功能?

2 个答案:

答案 0 :(得分:1)

似乎data.frame中缺少day列,但包含在图片中

library(dplyr)

set.seed(1234)
df <- data.frame(day = rep(c(0, 1), each = 10),
             sex = rep(c('M', 'F'), 10),
             profession = rep(c('Doctor', 'Lawyer'), each = 5),
             pariticpant = rep(1:10, 2),
             x = runif(20, 1, 10),
             y = runif(20, 1, 10))

df %>%
  group_by(pariticpant) %>%
  mutate(day = paste0(lag(day), "-", day), dx = x - lag(x), dy = y - lag(y)) %>%
  select(-x, -y) %>%
  filter(!is.na(dx))

Source: local data frame [10 x 8]
Groups: pariticpant [10]

     day    sex profession pariticpant         dx         dy
   <chr> <fctr>     <fctr>       <int>      <dbl>      <dbl>
1    0-1      M     Doctor           1  5.2189909  1.2553112
2    0-1      F     Doctor           2 -0.6959211 -0.3375603
3    0-1      M     Doctor           3 -2.9388703  1.3106358
4    0-1      F     Doctor           4  2.7004864  4.2057986
5    0-1      M     Doctor           5 -5.1173959 -0.3393300
6    0-1      F     Lawyer           6  1.7728652 -0.4583513
7    0-1      M     Lawyer           7  2.4905478 -2.9200456
8    0-1      F     Lawyer           8  0.3084325 -5.9026351
9    0-1      M     Lawyer           9 -4.3142487  1.4472483
10   0-1      F     Lawyer          10 -2.5382271  6.8542387

答案 1 :(得分:0)

您也可以这样做

set.seed (1)


df <- data.frame(
day = rep (c(0,1),c(10,10)),
sex = rep(c('M','F'), 10),
profession = rep(c('Doctor','Lawyer'), each = 5),
participant = rep(1:10, 2),
x = runif(20, 1, 10),
y = runif(20, 1, 10))

现在我们需要按性别,专业和参与者进行聚合,然后编写一个函数,返回两列x和y的差异。请记住,R中的函数返回计算的最后一个值(在本例中为最后的数据框)。

ddply(df, c("sex", "profession", "participant"), 
  function(dat) {
    ddx = 2*dat$x[[1]]-dat$x[[2]]
    ddy = 2*dat$y[[1]]-dat$y[[2]]
    data.frame (dx = ddx, dy = ddy)
    })

输出(未重新排序)

   sex profession participant         dx         dy
1    F     Doctor           2  3.9572263 -0.9337529
2    F     Doctor           4 -0.6294785  3.6342897
3    F     Lawyer           6  1.6292118 -1.7344123
4    F     Lawyer           8  0.7850676  1.2878669
5    F     Lawyer          10  2.1418901  0.3098424
6    M     Doctor           1 -3.1910030  1.8730386
7    M     Doctor           3 -4.1488559  5.5640663
8    M     Doctor           5  0.9190749 -0.2446371
9    M     Lawyer           7 -3.2924210  5.1612642
10   M     Lawyer           9  0.0743912 -5.4104425

希望这对你有所帮助。我找到了ddply函数,因为它易于理解。