如何使用group_by和sumsumise汇总所有列?

时间:2019-07-28 17:48:17

标签: r dplyr

我正在整理我的日常活动数据(加速度计数据)。我有32行(有些是重复的)和90列(一个主题的数据)。我想总结所有列每一天的重复行。

我需要总结重复的行并整理数据,我尝试了一些代码,但是,几乎所有方法都失败了。下面的代码是我可以获得的,它太大了。我想知道是否还有其他方法可以缩小它。

# LbNr = subjects' id
# Weekday = 1 Monday.... 7 Sunday
# Type = activities: A1. NonWorking, A2. Working, A4. SleepWeek, C0. Leisure, C4. SleepWeekend

# code
df %>% select(LbNr, Type, Weekday, Time, lie:IncTrunkWalk) %>% 
  group_by(LbNr, Type, Weekday) %>% 
  summarise(n = n(), Time = sum(Time),lie   = sum(lie), sit = sum(sit), stand = sum(stand),
            move = sum(move),   walk = sum(walk), run = sum(run),   stairs = sum(stairs),
            cycle = sum(cycle), row = sum(row), Steps = sum(Steps), WalkSlow = sum(WalkSlow),
            WalkFast = sum(WalkFast),   Sit_Tmax = sum(Sit_Tmax),   Sit_P50 = sum(Sit_P50),
            Sit_T50 = sum(Sit_T50), Sit_P10 = sum(Sit_P10), Sit_P90 = sum(Sit_P90),
            Sit_T30min = sum(Sit_T30min),   Sit_N30min = sum(Sit_N30min),   NriseSit = sum(NriseSit),
            SitLie_Tmax = sum(SitLie_Tmax), SitLie_P50 = sum(SitLie_P50),   SitLie_T50 = sum(SitLie_T50),
            SitLie_P10 = sum(SitLie_P10),   SitLie_P90 = sum(SitLie_P90),   SitLie_T30min = sum(SitLie_T30min),
            SitLie_N30min = sum(SitLie_N30min), NriseSitLie = sum(NriseSitLie), Stand_Tmax = sum(Stand_Tmax),
            StandMove_Tmax = sum(StandMove_Tmax),   ArmOff = sum(ArmOff),   IncArm30 = sum(IncArm30),
            IncArm60 = sum(IncArm60),   IncArm90 = sum(IncArm90),   IncArm120 = sum(IncArm120),
            IncArm150 = sum(IncArm150), IncArmMax90 = sum(IncArmMax90), IncArmSit30 = sum(IncArmSit30),
            IncArmSit60 = sum(IncArmSit60), IncArmSit90 = sum(IncArmSit90), IncArmSit120 = sum(IncArmSit120),
            IncArmSit150 = sum(IncArmSit150),   IncArmSitMax90 = sum(IncArmSitMax90),   
            IncArmStandMove30 = sum(IncArmStandMove30), IncArmStandMove60 = sum(IncArmStandMove60), 
            IncArmStandMove90 = sum(IncArmStandMove90), IncArmStandMove120 = sum(IncArmStandMove120),   
            IncArmStandMove150 = sum(IncArmStandMove150), IncArmStandMoveMax90 = sum(IncArmStandMoveMax90),
            IncArmUpright30 = sum(IncArmUpright30), IncArmUpright60 = sum(IncArmUpright60),
            IncArmUpright90 = sum(IncArmUpright90), IncArmUpright120 = sum(IncArmUpright120),   
            IncArmUpright150 = sum(IncArmUpright150), IncArmUprightMax90 = sum(IncArmUprightMax90),
            IncArmPrctile10 = sum(IncArmPrctile10), IncArmPrctile50 = sum(IncArmPrctile50),
            IncArmPrctile90 = sum(IncArmPrctile90), TrunkOff = sum(TrunkOff),
            ForwIncTrunk20 = sum(ForwIncTrunk20),   ForwIncTrunk30  = sum(ForwIncTrunk30),
            ForwIncTrunk60 = sum(ForwIncTrunk60), ForwIncTrunk90 = sum(ForwIncTrunk90),
            ForwIncTrunkMax60 = sum(ForwIncTrunkMax60), ForwIncTrunkSit20 = sum(ForwIncTrunkSit20),
            ForwIncTrunkSit30 = sum(ForwIncTrunkSit30), ForwIncTrunkSit60 = sum(ForwIncTrunkSit60),
            ForwIncTrunkSit90 = sum(ForwIncTrunk90), ForwIncTrunkSitMax60 = sum(ForwIncTrunkSitMax60),
            ForwIncTrunkStandMove20 = sum(ForwIncTrunkStandMove20), ForwIncTrunkStandMove30 = sum(ForwIncTrunkStandMove30), 
            ForwIncTrunkStandMove60 = sum(ForwIncTrunkStandMove60), ForwIncTrunkStandMove90 = sum(ForwIncTrunkStandMove90),
            ForwIncTrunkStandMoveMax60 = sum(ForwIncTrunkStandMoveMax60), ForwIncTrunkUpright20 = sum(ForwIncTrunkUpright20),   
            ForwIncTrunkUpright30 = sum(ForwIncTrunkUpright30), ForwIncTrunkUpright60  = sum(ForwIncTrunkUpright60),
            ForwIncTrunkUpright90 = sum(ForwIncTrunkUpright90), ForwIncTrunkUprightMax60 = sum(ForwIncTrunkUprightMax60),   
            IncTrunkWalk = sum(IncTrunkWalk)) %>% 
  arrange(Weekday) %>% filter(Weekday %in% c('3':'7'))

到目前为止,我在第三个代码上还有另一个问题,首先,它运行良好。但是我在星期六“ 6”有一个问题,当我连接时,时间可能是星期六收到从星期五开始的活动(请参见下面的示例),有时会出现A1。不工作或A4。 SleepWeek,取决于志愿者。我想总结一下C0上的这一不同活动。休闲。

#    LbNr    Type            Weekday  SB    LPA    MVPA
#   <dbl>   <chr>            <dbl>   <dbl>  <dbl>   <dbl>   
#13 22002 A1. NonWorking         6   0.319 0.101  0.0131 
#14 22002 C0. Leisure            6   10.0   4.93   0.714  
#15 22002 C4. SleepWeekend       6   7.88  0.0147 0.00278

#thrid code
df %>% select(LbNr,Type,Weekday,lie,sit,stand,move,WalkSlow,run,stairs,cycle,WalkFast) %>% 
  mutate(SB = lie+sit, LPA = stand+move+WalkSlow, MVPA = run+stairs+cycle+WalkFast) %>% 
  group_by(LbNr,Type, Weekday) %>% 
  summarise(SB = sum(SB), LPA = sum(LPA), MVPA = sum(MVPA)) %>% 
  arrange(Weekday) %>% filter(Weekday %in% c('3':'7'))

对于第一个问题,我希望得到一个表,其中包含所有列的重复行的总和。而且,如果可能的话,我希望对星期六的各种活动进行汇总可以获得更好的代码。

预先感谢, 路易斯

1 个答案:

答案 0 :(得分:0)

如果没有更好的例子,很难尝试回答您的问题(即,您可以dput()的数据来提供样本)。但是,这是您上一个问题的解决方案:“对于第一个问题,我希望得到一个包含所有列重复行的总和的表。此外,如果可能的话,我希望得到一个更好的总和代码周六的不同活动。”

# create toy data of 3 different IDs, 3 different types, and repeated days
df <- data.frame(id=sample(c(1:3),100,T),
                 type=sample(letters[1:3],100,T),
                 day=sample(c(1:7),100,T),
                 matrix(runif(300),nrow=100),
                 stringsAsFactors = F)

# gather data, summarize each activity column by ID, type and day
# and select Saturday==6
df %>% gather(k,v,-id,-type,-day) %>% 
  group_by(id,type,day,k) %>% 
  summarise(sum=sum(v)) %>% 
  filter(day==6) %>% 
  spread(k,sum)

# A tibble: 8 x 6
# Groups:   id, type, day [8]
     id type    day    X1    X2    X3
  <int> <chr> <int> <dbl> <dbl> <dbl>
1     1 a         6 1.85  3.26  2.09 
2     1 b         6 0.604 0.583 0.586
3     1 c         6 0.163 0.663 0.624
4     2 a         6 0.185 0.952 0.349
5     2 b         6 1.16  0.832 0.974
6     2 c         6 0.906 1.62  0.853
7     3 b         6 0.671 1.39  0.887
8     3 c         6 0.449 0.150 0.647