基于其他列值对行进行求和

时间:2014-03-28 10:52:22

标签: r sum row conditional

我是新用户,我正在寻找有人指出我正确的方向,我应该用什么功能来实现以下目标。

我有以下数据框。使用dput命令输出。

structure(list(ID = 4701:4702, Date.1 = structure(c(5L, 5L), .Label = c("01/02/2013", 
"01/03/2013", "01/05/2013", "02/05/2013", "04/02/2013", "04/03/2013", 
"05/02/2013", "05/03/2013", "06/02/2013", "06/03/2013", "07/02/2013", 
"07/03/2013", "08/02/2013", "08/07/2013", "12/12/2012", "13/12/2012", 
"14/01/2013", "14/12/2012", "15/01/2013", "16/01/2013", "17/01/2013", 
"17/12/2012", "18/01/2013", "18/04/2013", "18/12/2012", "19/04/2013", 
"23/01/2013", "24/01/2013", "25/01/2013", "26/04/2013", "28/01/2013", 
"29/01/2013", "29/04/2013", "30/04/2013", "31/01/2013"), class = "factor"), 
 Day.of.Week.1 = structure(c(2L, 2L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.1 = c(511.5, 405.5), Light.1 = c(133.666666666667, 
119.166666666667), Moderate.1 = c(12.1666666666667, 13.1666666666667
), Vigorous.1 = c(4.33333333333333, 3.5), Axis.1.Counts.1 = c(157124L, 
126177L), Axis.1.CPM.1 = c(237.5, 233.1), Time.1 = c(661.67, 
541.33), Day.of.Week.2 = structure(c(1L, 4L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.2 = c(370.166666666667, 601.833333333333), Light.2 = c(113, 
162.5), Moderate.2 = c(12, 13), Vigorous.2 = c(4, 10), Axis.1.Counts.2 = c(141593L, 
201373L), Axis.1.CPM.2 = c(283.7, 255.8), Number.of.Epochs.2 = c(2995L, 
4724L), Time.2 = c(499.17, 787.33), Day.of.Week.3 = structure(c(NA, 
5L), .Label = c("Friday", "Monday", "Thursday", "Tuesday", 
"Wednesday"), class = "factor"), Sedentary.3 = c(NA, 463), 
Light.3 = c(NA, 121.666666666667), Moderate.3 = c(NA, 14.5
), Vigorous.3 = c(NA, 11.5), Axis.1.Counts.3 = c(NA, 196192L
), Axis.1.CPM.3 = c(NA, 321.3), Number.of.Epochs.3 = c(NA, 
3664L), Time.3 = c(NA, 610.67), Day.of.Week.4 = structure(c(NA, 
3L), .Label = c("Friday", "Monday", "Thursday", "Tuesday", 
"Wednesday"), class = "factor"), Sedentary.4 = c(NA, 472.333333333333
), Light.4 = c(NA, 149.166666666667), Moderate.4 = c(NA, 
11.3333333333333), Vigorous.4 = c(NA, 14.1666666666667), 
Axis.1.Counts.4 = c(NA, 218895L), Axis.1.CPM.4 = c(NA, 338.3
), Number.of.Epochs.4 = c(NA, 3882L), Time.4 = c(NA, 647), 
Day.of.Week.5 = structure(c(NA, 1L), .Label = c("Friday", 
"Monday", "Thursday", "Tuesday", "Wednesday"), class = "factor"), 
Sedentary.5 = c(NA, 383.166666666667), Light.5 = c(NA, 106.5
), Moderate.5 = c(NA, 8), Vigorous.5 = c(NA, 0.5), Axis.1.Counts.5 = c(NA, 
92163L), Axis.1.CPM.5 = c(NA, 185), Number.of.Epochs.5 = c(NA, 
2989L), Time.5 = c(NA, 498.17)), .Names = c("ID", "Date.1", 
"Day.of.Week.1", "Sedentary.1", "Light.1", "Moderate.1", "Vigorous.1", 
"Axis.1.Counts.1", "Axis.1.CPM.1", "Time.1", "Day.of.Week.2", 
"Sedentary.2", "Light.2", "Moderate.2", "Vigorous.2", "Axis.1.Counts.2", 
"Axis.1.CPM.2", "Number.of.Epochs.2", "Time.2", "Day.of.Week.3", 
"Sedentary.3", "Light.3", "Moderate.3", "Vigorous.3", "Axis.1.Counts.3", 
"Axis.1.CPM.3", "Number.of.Epochs.3", "Time.3", "Day.of.Week.4",  
"Sedentary.4", "Light.4", "Moderate.4", "Vigorous.4", "Axis.1.Counts.4", 
"Axis.1.CPM.4", "Number.of.Epochs.4", "Time.4", "Day.of.Week.5", 
"Sedentary.5", "Light.5", "Moderate.5", "Vigorous.5", "Axis.1.Counts.5", 
"Axis.1.CPM.5", "Number.of.Epochs.5", "Time.5"), reshapeWide = structure(list(
v.names = NULL, timevar = "ID2", idvar = "ID", times = 1:5, 
varying = structure(c("Filename.1", "Epoch.1", "Weight..kg..1", 
"Age.1", "Gender.1", "Date.1", "Day.of.Week.1", "Day.of.Week.Num.1", 
"Sedentary.1", "Light.1", "Moderate.1", "Vigorous.1", "Axis.1.Counts.1", 
"Axis.1.Average.Counts.1", "Axis.1.CPM.1", "Number.of.Epochs.1", 
"Time.1", "Calendar.Days.1", "Filename.2", "Epoch.2", "Weight..kg..2", 
"Age.2", "Gender.2", "Date.2", "Day.of.Week.2", "Day.of.Week.Num.2", 
"Sedentary.2", "Light.2", "Moderate.2", "Vigorous.2", "Axis.1.Counts.2", 
"Axis.1.Average.Counts.2", "Axis.1.CPM.2", "Number.of.Epochs.2", 
"Time.2", "Calendar.Days.2", "Filename.3", "Epoch.3", "Weight..kg..3", 
"Age.3", "Gender.3", "Date.3", "Day.of.Week.3", "Day.of.Week.Num.3", 
"Sedentary.3", "Light.3", "Moderate.3", "Vigorous.3", "Axis.1.Counts.3", 
"Axis.1.Average.Counts.3", "Axis.1.CPM.3", "Number.of.Epochs.3", 
"Time.3", "Calendar.Days.3", "Filename.4", "Epoch.4", "Weight..kg..4", 
"Age.4", "Gender.4", "Date.4", "Day.of.Week.4", "Day.of.Week.Num.4", 
"Sedentary.4", "Light.4", "Moderate.4", "Vigorous.4", "Axis.1.Counts.4", 
"Axis.1.Average.Counts.4", "Axis.1.CPM.4", "Number.of.Epochs.4", 
"Time.4", "Calendar.Days.4", "Filename.5", "Epoch.5", "Weight..kg..5", 
"Age.5", "Gender.5", "Date.5", "Day.of.Week.5", "Day.of.Week.Num.5", 
"Sedentary.5", "Light.5", "Moderate.5", "Vigorous.5", "Axis.1.Counts.5", 
"Axis.1.Average.Counts.5", "Axis.1.CPM.5", "Number.of.Epochs.5", 
"Time.5", "Calendar.Days.5"), .Dim = c(18L, 5L))), .Names = c("v.names", 
"timevar", "idvar", "times", "varying")), row.names = c(1L, 3L
), class = "data.frame")

我想对每行ACROSS列sedentary.1, sedentary.2, sedentary.3, sedentary.4sedentary.5求和。但是,如果另一列符合某个标准,我希望每列都包含在计算中。

包括专栏:

  

-sedentary.1如果time in time.1> = 377
  -sedentary.2如果时间上的值.2> = 377
  -sedentary.3如果时间上的值.3> = 377
  -sedentary.4如果时间价值.4> = 377
  -sedentary.5如果时间上的值.5> = 377

我可以使用SumIf函数在Excel中执行此操作,但我不知道从哪里开始为此。如果你能指出我能阅读的功能,我将非常感激。

非常感谢,

3 个答案:

答案 0 :(得分:0)

在其他列上建立索引可以帮助您入门。

sum(df$Sedentary.1[df$Time.1 >= 377])

plyr包是一次获取多列总和的好方法。

library(plyr)

df2 <- ddply(df, .(), summarise, Sedentary.1 = sum(Sedentary.1[Time.1 >= 377], na.rm = TRUE), 
             Sedentary.2 = sum(Sedentary.2[Time.2 >= 377], na.rm = TRUE))

   .id Sedentary.1 Sedentary.2
1 <NA>         917         972

答案 1 :(得分:0)

可能有一种更有效和/或更干净的方式,但在这里我发现哪些时间列不是NA,并且符合您的标准,然后在将Sedentary列乘以答案后取rowSums。 TRUE将被视为1,而FALSE将被视为0 - 因此结果是符合条件的行的总和,因为在求和之前,不需要的Sedentary值乘以0。

x是您提供的数据框的名称。

rowSums(x[c("Sedentary.1","Sedentary.2","Sedentary.3","Sedentary.4","Sedentary.5")] * (!is.na(x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")]) & x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")] >= 377), na.rm=TRUE)

在评论中修改问题:

这样的事情应该有效:

# make TRUE/FALSE table
TF = !is.na(x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")]) & x[,c("Time.1","Time.2","Time.3","Time.4","Time.5")] >= 377

# take rowSums of Sedentary.x when TF rowSums are greater than or equal to 3
rowSums(x[rowSums(TF) >= 3,c("Sedentary.1","Sedentary.2","Sedentary.3","Sedentary.4","Sedentary.5")] * TF[rowSums(TF) >= 3,], na.rm=TRUE)

如果你愿意的话,你可以把它分成一个单行,但我把它分成几个阶段,将TRUE / FALSE表保存为“TF”以提高可读性。

答案 2 :(得分:0)

我这样做了。首先,我找到哪些Time *列的值> = 377,然后将其与data.frame相乘,data.frame只是Sedentary *列的子集。 R处理TRUE为1,FALSE处理为零,因此存在FALSE的值将变为0.如果有NA,则值将保持为NA。

此代码假定Time和Sedentary列出的顺序相同。

sub.time <- mydf[, names(mydf)[grepl("Time", names(mydf))], ]
sumif <- sub.time >= 377
sub.sed <- mydf[, names(mydf)[grepl("Sedentary", names(mydf))], ]
apply(sub.sed * sumif, MARGIN = 1, sum, na.rm = TRUE)

        1         3 
 881.6667 2325.8333