我有按日期排序的餐厅检查数据框。对于每次观察,我想添加两个额外的变量来记录这家餐馆有多少次检查,以及他们失败了多少次。我想避免使用for循环,但我不确定如何做到这一点。基本上,我目前有一个由下面数据框的前三列组成的数据框,我想添加最后两列。
Restaurant_ID Date Result
1 01/02/2011 Pass
2 02/05/2011 Pass
3 04/07/2011 Fail
1 09/05/2011 Fail
2 03/13/2012 Pass
1 08/25/2012 Fail
2 09/25/2012 Pass
3 01/05/2013 Pass
Restaurant_ID Date Result total_inspect failed_inspect
1 01/02/2011 Pass 1 0
2 02/05/2011 Pass 1 0
3 04/07/2011 Fail 1 1
1 09/05/2011 Fail 2 1
2 03/13/2012 Pass 2 0
1 08/25/2012 Fail 3 2
2 09/25/2012 Pass 3 0
3 01/05/2013 Pass 2 1
编辑:我意识到我实际上希望最后两列反映当前观察之前的总检查次数和失败次数。所以我真正想要的是
Restaurant_ID Date Result past_inspect past_failed_inspect
1 01/02/2011 Pass 0 0
2 02/05/2011 Pass 0 0
3 04/07/2011 Fail 0 0
1 09/05/2011 Fail 1 0
2 03/13/2012 Pass 1 0
1 08/25/2012 Fail 2 1
2 09/25/2012 Pass 2 0
3 01/05/2013 Pass 1 1
答案 0 :(得分:3)
此解决方案使用包tidyverse
和lubridate
中的函数。
# Create the example data frame
dt1 <- read.csv(text = "Restaurant_ID,Date,Result
1,01/02/2011,Pass
2,02/05/2011,Pass
3,04/07/2011,Fail
1,09/05/2011,Fail
2,03/13/2012,Pass
1,08/25/2012,Fail
2,09/25/2012,Pass
3,01/05/2013,Pass",
stringsAsFactors = FALSE)
# Load packages
library(tidyverse)
library(lubridate)
dt2 <- dt1 %>%
# Convert the Date column to Date class
mutate(Date = mdy(Date)) %>%
# Sort the data frame based on Restaurant_ID and Date
arrange(Restaurant_ID, Date) %>%
# group the data by each restaurant ID
group_by(Restaurant_ID) %>%
# Create a column showing total_inspect
mutate(total_inspect = 1:n()) %>%
# Create a column showing fail_result, fail is 1, pass is 0
mutate(fail_result = ifelse(Result == "Fail", 1, 0)) %>%
# Calculate the cumulative sum of fail_result
mutate(failed_inspect = cumsum(fail_result)) %>%
# Remove fail_result
select(-fail_result) %>%
# Sort the data frame by Date
arrange(Date)
dt3 <- dt2 %>%
mutate(past_inspect = ifelse(total_inspect == 0, 0, total_inspect - 1)) %>%
mutate(past_failed_inspect = ifelse(Result == "Fail" & failed_inspect != 0,
failed_inspect - 1,
failed_inspect)) %>%
select(-total_inspect, -failed_inspect)