根据另一列的唯一值计算两个值在列中出现的次数

时间:2018-12-11 19:25:02

标签: r dplyr

我的数据框如下:

year<-c("2000","2000","2001","2002","2000")
gender<-c("M","F","M","F","M")
YG<-data.frame(year,gender)

在此数据框中,我要计算每年的“ M”和“ F”数,然后创建一个新的数据框,如:

year M F
1 2000 2 1
2 2001 1 0
3 2002 0 1

我尝试过类似的事情:

library(dplyr)
ns<-YG %>%
  group_by(year) %>%
  count(YG$gender == "M")

2 个答案:

答案 0 :(得分:2)

使用reshape2的解决方案:

dcast(YG, year~gender)

  year F M
1 2000 1 2
2 2001 0 1
3 2002 1 0

或其他tidyverse解决方案:

YG %>%
 group_by(year) %>%
 summarise(M = length(gender[gender == "M"]),
           F = length(gender[gender == "F"]))

  year      M     F
  <fct> <int> <int>
1 2000      2     1
2 2001      1     0
3 2002      0     1

或由@ zx8754提议:

YG %>%
 group_by(year) %>%
 summarise(M = sum(gender == "M"),
           F = sum(gender == "F"))

答案 1 :(得分:1)

我们可以使用countspread来获取df格式,并在fill = 0中使用spread来填充0:

library(tidyverse)
YG %>%
  group_by(year) %>%
  count(gender) %>%
  spread(gender, n, fill = 0)

输出:

# A tibble: 3 x 3
# Groups:   year [3]
  year      F     M
  <fct> <dbl> <dbl>
1 2000      1     2
2 2001      0     1
3 2002      1     0