我正在整理我的日常活动数据(加速度计数据)。我想对数据的重复天求和并取平均值,但仅对具有A2.Working > 6 hours
的天进行求和。另一个条件是,一天必须有完整的24小时才能算作有效的一天。有效日期将包含这三个变量A1.NonWorking, A2.Working, A4.SleepWeek
,它们的总和将为24小时(例如,数据Weekday 2
之后没有24小时,因为加速度计是在这一天(星期二)附加的) 。这是一个可重现的示例:
df <- tibble(
LbNr = c(22002,22002,22002,22002,
22002,22002,22002,22002,22002,22002,22002,22002,22002,
22002,22002,22002,22002,22002,22002,22002,22002,22002,
22002,22002,22002,22002),
Type = c("A2.Working","A1.NonWorking",
"A4.SleepWeek","A4.SleepWeek","A1.NonWorking","A2.Working",
"A1.NonWorking","A1.NonWorking","A4.SleepWeek","A1.NonWorking",
"A2.Working","A1.NonWorking","A4.SleepWeek","A4.SleepWeek",
"A1.NonWorking","A2.Working","A1.NonWorking","C0.Leisure",
"C4.SleepWeekend","C0.Leisure","C0.Leisure","C4.SleepWeekend",
"C0.Leisure","C4.SleepWeekend","A4.SleepWeek","A1.NonWorking"),
Weekday = c(2,2,2,3,3,3,3,4,4,4,4,4,4,5,5,5,5,6,6,6,7,7,7,7,1,1),
Time = c(9.83333,6.05,0.11667,6.83333,1.33333,
9.83333,6,0.03333,7.2,6.43333,5,5.23333,0.1,6.41667,0.96667,11.01667,
5.6,0.43333,7.9,15.66667,0.03333,7.91667,15.61667,0.43333,6.33333,0.66667))
我尝试此代码时没有选择特定的日期:
df %>%
group_by(LbNr, Type, Weekday) %>%
summarise_all(.,sum) %>%
group_by(LbNr, Weekday) %>%
filter(any((Time >= 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
group_by(LbNr, Type) %>%
select(-Weekday) %>%
summarise_all(., mean, na.rm = TRUE)
但是,当我运行代码时,我得到以下信息:
LbNr Type Time
<dbl> <chr> <dbl>
1 22002 A1.NonWorking 6.65
2 22002 A2.Working 10.2
3 22002 A4.SleepWeek 4.46
4 22002 C0.Leisure 15.9
5 22002 C4.SleepWeekend 8.12
如果我将工作日加起来(6.65 + 10.20 + 4.46 = 21.31),则会给我错误的结果,因为它加和了Weekday 2, 3, 4, 5
和A1.NonWorking
的{{1}}。
我想要一个返回此结果的代码:
A4.SleepWeek
如果我将工作日加起来(6.95 + 10.40 + 6.62 = 23.97,几乎是24小时),则可以使用以下代码获得正确的结果:
LbNr Type Time
<dbl> <chr> <dbl>
1 22002 A1.NonWorking 6.95
2 22002 A2.Working 10.4
3 22002 A4.SleepWeek 6.62
4 22002 C0.Leisure 15.9
5 22002 C4.SleepWeekend 8.12
我为此志愿者使用了df %>%
group_by(LbNr, Type, Weekday) %>%
summarise_all(.,sum) %>%
filter(Weekday %in% c('3':'7')) %>%
group_by(LbNr, Weekday) %>%
filter(any((Time >= 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
group_by(LbNr, Type) %>%
select(-Weekday) %>%
summarise_all(., mean, na.rm = TRUE)
,因为我知道filter(Weekday %in% c('3':'7'))
没有24小时。我想要的代码可以返回正确的值,而无需像我一样需要特定的日期过滤器。仅选择24小时的天。
答案 0 :(得分:2)
您可以尝试使用j
和any
为Time > 6
的{{1}}个工作日进行过滤。按Type
分组后,这可以工作(如果符合条件,请保留A2.Working
的所有行)。还要假设您将包括所有Weekday
6和7(似乎是周末)。这就是您的想法吗?
Weekday
编辑:根据评论,如果您希望确定如果总时间约为24小时,则包括几天,则可以按Weekdays
和{ {1}}与library(dplyr)
df %>%
group_by(LbNr, Type, Weekday) %>%
summarise_all(.,sum) %>%
group_by(LbNr, Weekday) %>%
filter(any((Time > 6 & Type == "A2.Working") | Weekday == 6 | Weekday == 7)) %>%
group_by(LbNr, Type) %>%
select(-Weekday) %>%
summarise_all(., mean, na.rm = TRUE)
LbNr Type Time
<dbl> <chr> <dbl>
1 22002 A1.NonWorking 7.27
2 22002 A2.Working 10.2
3 22002 A4.SleepWeek 6.51
4 22002 C0.Leisure 15.9
5 22002 C4.SleepWeekend 8.12
(使用接近24小时的阈值)。
这里是结果代码,假设再次包含第6天和第7天(周末是否收集24小时数据)。我加入了有意义的逻辑-尽管可以进一步简化(例如,如果A2的时间<6小时,则不需要24小时的总标准)。希望这更接近您的需求。
LbNr