我正在整理我的日常活动数据(加速度计数据)。我有32行(有些是重复的)和90列(一个主题的数据)。我想总结所有列每一天的重复行。
我需要总结重复的行并整理数据,我尝试了一些代码,但是,几乎所有方法都失败了。下面的代码是我可以获得的,它太大了。我想知道是否还有其他方法可以缩小它。
# LbNr = subjects' id
# Weekday = 1 Monday.... 7 Sunday
# Type = activities: A1. NonWorking, A2. Working, A4. SleepWeek, C0. Leisure, C4. SleepWeekend
# code
df %>% select(LbNr, Type, Weekday, Time, lie:IncTrunkWalk) %>%
group_by(LbNr, Type, Weekday) %>%
summarise(n = n(), Time = sum(Time),lie = sum(lie), sit = sum(sit), stand = sum(stand),
move = sum(move), walk = sum(walk), run = sum(run), stairs = sum(stairs),
cycle = sum(cycle), row = sum(row), Steps = sum(Steps), WalkSlow = sum(WalkSlow),
WalkFast = sum(WalkFast), Sit_Tmax = sum(Sit_Tmax), Sit_P50 = sum(Sit_P50),
Sit_T50 = sum(Sit_T50), Sit_P10 = sum(Sit_P10), Sit_P90 = sum(Sit_P90),
Sit_T30min = sum(Sit_T30min), Sit_N30min = sum(Sit_N30min), NriseSit = sum(NriseSit),
SitLie_Tmax = sum(SitLie_Tmax), SitLie_P50 = sum(SitLie_P50), SitLie_T50 = sum(SitLie_T50),
SitLie_P10 = sum(SitLie_P10), SitLie_P90 = sum(SitLie_P90), SitLie_T30min = sum(SitLie_T30min),
SitLie_N30min = sum(SitLie_N30min), NriseSitLie = sum(NriseSitLie), Stand_Tmax = sum(Stand_Tmax),
StandMove_Tmax = sum(StandMove_Tmax), ArmOff = sum(ArmOff), IncArm30 = sum(IncArm30),
IncArm60 = sum(IncArm60), IncArm90 = sum(IncArm90), IncArm120 = sum(IncArm120),
IncArm150 = sum(IncArm150), IncArmMax90 = sum(IncArmMax90), IncArmSit30 = sum(IncArmSit30),
IncArmSit60 = sum(IncArmSit60), IncArmSit90 = sum(IncArmSit90), IncArmSit120 = sum(IncArmSit120),
IncArmSit150 = sum(IncArmSit150), IncArmSitMax90 = sum(IncArmSitMax90),
IncArmStandMove30 = sum(IncArmStandMove30), IncArmStandMove60 = sum(IncArmStandMove60),
IncArmStandMove90 = sum(IncArmStandMove90), IncArmStandMove120 = sum(IncArmStandMove120),
IncArmStandMove150 = sum(IncArmStandMove150), IncArmStandMoveMax90 = sum(IncArmStandMoveMax90),
IncArmUpright30 = sum(IncArmUpright30), IncArmUpright60 = sum(IncArmUpright60),
IncArmUpright90 = sum(IncArmUpright90), IncArmUpright120 = sum(IncArmUpright120),
IncArmUpright150 = sum(IncArmUpright150), IncArmUprightMax90 = sum(IncArmUprightMax90),
IncArmPrctile10 = sum(IncArmPrctile10), IncArmPrctile50 = sum(IncArmPrctile50),
IncArmPrctile90 = sum(IncArmPrctile90), TrunkOff = sum(TrunkOff),
ForwIncTrunk20 = sum(ForwIncTrunk20), ForwIncTrunk30 = sum(ForwIncTrunk30),
ForwIncTrunk60 = sum(ForwIncTrunk60), ForwIncTrunk90 = sum(ForwIncTrunk90),
ForwIncTrunkMax60 = sum(ForwIncTrunkMax60), ForwIncTrunkSit20 = sum(ForwIncTrunkSit20),
ForwIncTrunkSit30 = sum(ForwIncTrunkSit30), ForwIncTrunkSit60 = sum(ForwIncTrunkSit60),
ForwIncTrunkSit90 = sum(ForwIncTrunk90), ForwIncTrunkSitMax60 = sum(ForwIncTrunkSitMax60),
ForwIncTrunkStandMove20 = sum(ForwIncTrunkStandMove20), ForwIncTrunkStandMove30 = sum(ForwIncTrunkStandMove30),
ForwIncTrunkStandMove60 = sum(ForwIncTrunkStandMove60), ForwIncTrunkStandMove90 = sum(ForwIncTrunkStandMove90),
ForwIncTrunkStandMoveMax60 = sum(ForwIncTrunkStandMoveMax60), ForwIncTrunkUpright20 = sum(ForwIncTrunkUpright20),
ForwIncTrunkUpright30 = sum(ForwIncTrunkUpright30), ForwIncTrunkUpright60 = sum(ForwIncTrunkUpright60),
ForwIncTrunkUpright90 = sum(ForwIncTrunkUpright90), ForwIncTrunkUprightMax60 = sum(ForwIncTrunkUprightMax60),
IncTrunkWalk = sum(IncTrunkWalk)) %>%
arrange(Weekday) %>% filter(Weekday %in% c('3':'7'))
到目前为止,我在第三个代码上还有另一个问题,首先,它运行良好。但是我在星期六“ 6”有一个问题,当我连接时,时间可能是星期六收到从星期五开始的活动(请参见下面的示例),有时会出现A1。不工作或A4。 SleepWeek,取决于志愿者。我想总结一下C0上的这一不同活动。休闲。
# LbNr Type Weekday SB LPA MVPA
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#13 22002 A1. NonWorking 6 0.319 0.101 0.0131
#14 22002 C0. Leisure 6 10.0 4.93 0.714
#15 22002 C4. SleepWeekend 6 7.88 0.0147 0.00278
#thrid code
df %>% select(LbNr,Type,Weekday,lie,sit,stand,move,WalkSlow,run,stairs,cycle,WalkFast) %>%
mutate(SB = lie+sit, LPA = stand+move+WalkSlow, MVPA = run+stairs+cycle+WalkFast) %>%
group_by(LbNr,Type, Weekday) %>%
summarise(SB = sum(SB), LPA = sum(LPA), MVPA = sum(MVPA)) %>%
arrange(Weekday) %>% filter(Weekday %in% c('3':'7'))
对于第一个问题,我希望得到一个表,其中包含所有列的重复行的总和。而且,如果可能的话,我希望对星期六的各种活动进行汇总可以获得更好的代码。
预先感谢, 路易斯
答案 0 :(得分:0)
如果没有更好的例子,很难尝试回答您的问题(即,您可以dput()
的数据来提供样本)。但是,这是您上一个问题的解决方案:“对于第一个问题,我希望得到一个包含所有列重复行的总和的表。此外,如果可能的话,我希望得到一个更好的总和代码周六的不同活动。”
# create toy data of 3 different IDs, 3 different types, and repeated days
df <- data.frame(id=sample(c(1:3),100,T),
type=sample(letters[1:3],100,T),
day=sample(c(1:7),100,T),
matrix(runif(300),nrow=100),
stringsAsFactors = F)
# gather data, summarize each activity column by ID, type and day
# and select Saturday==6
df %>% gather(k,v,-id,-type,-day) %>%
group_by(id,type,day,k) %>%
summarise(sum=sum(v)) %>%
filter(day==6) %>%
spread(k,sum)
# A tibble: 8 x 6
# Groups: id, type, day [8]
id type day X1 X2 X3
<int> <chr> <int> <dbl> <dbl> <dbl>
1 1 a 6 1.85 3.26 2.09
2 1 b 6 0.604 0.583 0.586
3 1 c 6 0.163 0.663 0.624
4 2 a 6 0.185 0.952 0.349
5 2 b 6 1.16 0.832 0.974
6 2 c 6 0.906 1.62 0.853
7 3 b 6 0.671 1.39 0.887
8 3 c 6 0.449 0.150 0.647