嵌套组中对之间的dplyr差异

时间:2017-12-05 18:37:47

标签: r dplyr

我想使用dplyr计算value people嵌套在pair session dat <- data.frame(person=c(rep(1, 10), rep(2, 10), rep(3, 10), rep(4, 10), rep(5, 10), rep(6, 10), rep(7, 10), rep(8, 10)), pair=c(rep(1, 20), rep(2, 20), rep(3, 20), rep(4, 20)), condition=c(rep("NEW", 10), rep("OLD", 10), rep("NEW", 10), rep("OLD", 10), rep("NEW", 10), rep("OLD", 10), rep("NEW", 10), rep("OLD", 10)), session=rep(seq(from=1, to=10, by=1), 8), value=c(0, 2, 4, 8, 16, 16, 18, 20, 20, 20, 0, 1, 1, 2, 4, 5, 8, 12, 15, 15, 0, 2, 8, 10, 15, 16, 18, 20, 20, 20, 0, 4, 4, 6, 6, 8, 10, 12, 12, 18, 0, 6, 8, 10, 16, 16, 18, 20, 20, 20, 0, 2, 2, 3, 4, 8, 8, 8, 10, 12, 0, 10, 12, 16, 18, 18, 18, 20, 20, 20, 0, 2, 2, 8, 10, 10, 11, 12, 15, 20) ) 之间的差异。

person

例如,pair==1 1和2成对(person==1):

  • session==2&amp; person==2:2
  • session==2&amp; NEW:1

差异(OLD - 2-1=1)是group_by()

这是我到目前为止所尝试的内容。我想我先summarise()然后dat %>% mutate(session = factor(session)) %>% group_by(condition, pair, session) %>% summarise(pairDiff = value-first(value)) ,但我还没有破解这个坚果。

import re
import requests
from bs4 import BeautifulSoup

r = requests.get('https://letterboxd.com/film/donnie-darko/')
soup = BeautifulSoup(r.text, 'lxml')
cinematographer = soup(href=re.compile(r'/cinematography/'))[0].text

print cinematographer
# outputs "Stephen Poster"

期望的输出:

enter image description here

2 个答案:

答案 0 :(得分:2)

您的输出可以通过以下方式获得:

dat %>% group_by(pair,session) %>% arrange(condition) %>% summarise(diff = -diff(value))
Source: local data frame [40 x 3]
Groups: pair [?]

# A tibble: 40 x 3
    pair session  diff
   <dbl>   <dbl> <dbl>
 1     1       1     0
 2     1       2     1
 3     1       3     3
 4     1       4     6
 5     1       5    12
 6     1       6    11
 7     1       7    10
 8     1       8     8
 9     1       9     5
10     1      10     5
# ... with 30 more rows

arrange确保NEW和OLD位于正确的位置,但解决方案确实取决于每个组合和会话组合恰好有2个值。

答案 1 :(得分:1)

您可以将condition传播到标头,然后执行减法NEW - OLD

library(dplyr); library(tidyr)

dat %>% 
    select(-person) %>% 
    spread(condition, value) %>% 
    mutate(diff = NEW - OLD) %>% 
    select(session, pair, diff)

# A tibble: 40 x 3
#   session  pair  diff
#     <dbl> <dbl> <dbl>
# 1       1     1     0
# 2       2     1     1
# 3       3     1     3
# 4       4     1     6
# 5       5     1    12
# 6       6     1    11
# 7       7     1    10
# 8       8     1     8
# 9       9     1     5
#10      10     1     5
# ... with 30 more rows