摘要DataFrame的其他列

时间:2019-07-18 01:46:35

标签: r dplyr

我想在汇总数据框中添加列,以计算特定因素。

bookplace <- data.frame(type = c("reading", "reading", "reading", "reading", "lending", "lending"), 
                        sex = c("male", "female", "male", "female", "male", "female"), 
                        usage = c(103, 102, 23, 14, 16, 8), 
                        date = c("1/1/18","1/1/18","1/1/18","1/1/18","1/1/18","1/1/18"),
                        stringsAsFactors = FALSE)

结果应为(将男性和女性作为添加的列):

year  type    users  male  female
2018  lending    24    16       8
2018  reading   242   126     116

我尝试使用mutate添加列,然后使用以下代码进行总结:

bookplace %>% 
  mutate(males=count(sex=="male"),
         females=count(sex=="female")) %>%
  group_by(year=format(date,"%Y"), type) %>% 
  summarize(users=sum(usage))

但是我有以下错误消息:

  

UseMethod(“ groups”)中的错误:     没有适用于“组”的适用于“逻辑”类对象的方法

请,我们将不胜感激。

2 个答案:

答案 0 :(得分:0)

tidyverse解决方案。假设日期为%m/%d/%y。如果没有,请相应地更改格式字符串。

library(dplyr)
library(tidyr)

bookplace %>% 
  mutate(year = format(as.Date("1/1/18", "%m/%d/%y"), "%Y")) %>% 
  group_by(year, sex, type) %>% 
  summarise(Total = sum(usage)) %>% 
  ungroup() %>% 
  spread(sex, Total) %>% 
  mutate(users = female + male)

结果:

# A tibble: 2 x 5
  year  type    female  male users
  <chr> <chr>    <dbl> <dbl> <dbl>
1 2018  lending      8    16    24
2 2018  reading    116   126   242

答案 1 :(得分:0)

这是使用dplyr的答案

bookplace <- data.frame(c("reading", "reading", "reading", 
                          "reading", "lending", "lending"), 
                        c("male", "female", "male", "female", "male", "female"), 
                        c(103, 102, 23, 14, 16, 8), 
                        c("1/1/18","1/1/18","1/1/18","1/1/18","1/1/18","1/1/18"))
colnames(bookplace) <- c("type","Gender","Usage","Year")
bookplace$Year <- as.Date(bookplace$Year, format = "%d/%m/%Y")
check <- bookplace%>%group_by(Year,type)%>%summarise(Users = sum(Usage),male = sum(Usage[ Gender =="male"]),
                                                     female = sum(Usage[Gender == "female"]))

我从这个问题中得到了主意 Summarize with conditions in dplyr