计算数据框中子集的差异

时间:2016-07-05 01:50:49

标签: r dataframe dplyr

我有一个包含2个因素的数据框。像这样

Eyecolour Haircolour Points
   <fctr> <fctr>    <dbl>
1  brown blond         4
2  brown brunette      -8
3  blue  blond         2
4  blue  brunette      3
5  green blond         -5
6  green brunette      9

我希望每个Eyecolor的金发和黑发之间的点数有所不同,或者只是简单地从每个Eyecolor的黑发中减去金发。

我尝试使用dplyr包,但我正在努力使代码正确。 diff()也不喜欢负值。

2 个答案:

答案 0 :(得分:2)

使用您的数据

df <- read.table(text = c("
Eyecolour Haircolour Points
brown blond         4
brown brunette      -8
blue  blond         2
blue  brunette      3
green blond         -5
green brunette      9"), header = T)

你可以尝试

library(dplyr)
library(tidyr)
df %>%
  tidyr::spread(Haircolour, Points) %>%
  dplyr::mutate(diff = blond - brunette)

<强> RESULT

  Eyecolour blond brunette diff
1      blue     2        3   -1
2     brown     4       -8   12
3     green    -5        9  -14

答案 1 :(得分:2)

我们可以使用

library(dplyr)
df %>% 
   mutate(Haircolour = as.character(Haircolour)) %>% 
   group_by(Eyecolour) %>% 
   summarise(Diff = Points[Haircolour=="blond"] - Points[Haircolour =="brunette"])
#   Eyecolour  Diff 
#      <fctr> <int>
#1      blue    -1
#2     brown    12
#3     green   -14

或使用data.table

 library(data.table)
 dcast(setDT(df), Eyecolour~Haircolour, value.var="Points")[, Diff:= blond-brunette][]
 #   Eyecolour blond brunette Diff
 #1:      blue     2        3   -1
 #2:     brown     4       -8   12
 #3:     green    -5        9  -14