我有一些代码被重复24次,以便考虑到每天不同的小时数。 我想知道简化这段代码的可能性:
SBS00<-colSums(subset(Total[c(14:54)],Total$Hour=="00:00:00"|Total$Group=="SBS"))
SBS01<-colSums(subset(Total[c(14:54)],Total$Hour=="01:00:00"|Total$Group=="SBS"))
SBS02<-colSums(subset(Total[c(14:54)],Total$Hour=="02:00:00"|Total$Group=="SBS"))
SBS03<-colSums(subset(Total[c(14:54)],Total$Hour=="03:00:00"|Total$Group=="SBS"))
...
SBS23<-colSums(subset(Total[c(14:54)],Total$Hour=="23:00:00"|Total$Group=="SBS"))
所以一般的想法是将24个新变量SBS00送到SBS23。
当我运行该代码时,我需要使用以下代码将这些代码组合到一个数据框中:
SBS <- data.frame(SBS00,SBS01,SBS02,SBS03,...,SBS23)
是否有可能将其清理干净?
我还有一段需要精简的代码:
SlopeSBS00<-lm(SBSNy$SBS00[c(1:10,17:41)] ~ Numbers[c(1:10,17:41)])$coeff[2]
SlopeSBS01<-lm(SBSNy$SBS01[c(1:10,17:41)] ~ Numbers[c(1:10,17:41)])$coeff[2]
SlopeSBS02<-lm(SBSNy$SBS02[c(1:10,17:41)] ~ Numbers[c(1:10,17:41)])$coeff[2]
SlopeSBS03<-lm(SBSNy$SBS03[c(1:10,17:41)] ~ Numbers[c(1:10,17:41)])$coeff[2]
...
SlopeSBS23<-lm(SBSNy$SBS23[c(1:10,17:41)] ~ Numbers[c(1:10,17:41)])$coeff[2]
这里SBSNy是先前的SBS的转换版本,而Numbers是1:41的数字向量,所以这个代码对每一行的基本操作是对每个SBS00到SBS23进行SBSNy的线性回归1:10和17:41。 Coeff [2]只输出这里需要的斜率。
最后我还有一个不同的代码,需要清理,这看起来像这样:
Total$Base00 <- (Total$base + Total$base*dataval*11)
Total$Base01 <- (Total$base + Total$base*dataval*12)
Total$Base02 <- (Total$base + Total$base*dataval*13)
Total$Base03 <- (Total$base + Total$base*dataval*14)
...
Total$Base30 <- (Total$base + Total$base*dataval*41)
从00到30总共给出31个基本变量。
然后这个代码也会跟进:
Total$Uplift00 <- (Total$cols11 - Total$Base00)
Total$Uplift01 <- (Total$cols12 - Total$Base01)
Total$Uplift02 <- (Total$cols13 - Total$Base02)
Total$Uplift03 <- (Total$cols14 - Total$Base03)
...
Total$Uplift30 <- (Total$cols41 - Total$Base30)
我希望你能提供帮助,因为这会简化我的代码!
答案 0 :(得分:2)
您可以使用sapply/lapply
为多列执行此操作
Hr <- sprintf('%02d:00:00',0:23)
SBS <- do.call(cbind,lapply(Hr, function(x)
colSums(subset(Total[14:54], Total$Hour==x & Total$Group=='SBS'))))
colnames(SBS) <- sprintf('SBS%02d', 0:23)
或使用dplyr
library(dplyr)
Total %>%
filter(Group=='SBS') %>%
group_by(Hour) %>%
summarise_each(funs(sum),14:54) %>%
select(-Hour) %>%
t()
或使用aggregate
base R
T1 <- cbind(Total[c(14:54)], Total['Hour'])
t(aggregate(.~Hour, T1, subset=Total$Group=='SBS', FUN=sum)[,-1])
对于第二种情况
nm1 <- sprintf('Base%02d', 0:30)
Total[nm1] <- lapply(11:41, function(x) with(Total, base + base*dataval*x))
第三次
nm2 <- sprintf('Uplift%02d', 0:30)
Total[nm2] <- Total[paste0('cols',11:41)]-Total[nm1]
set.seed(24)
df1 <- as.data.frame(matrix(sample(0:9, 54*100, replace=TRUE), ncol=54))
set.seed(39)
Total <- cbind(df1, Hour=sample(sprintf('%02d:00:00', 0:23), 100,
replace=TRUE), Group= sample(c('SBS', 'SBT', 'SBI'), 100,
replace=TRUE), stringsAsFactors=FALSE)
dataval <- 5
colnames(Total)[1] <- 'base'
colnames(Total)[11:41] <- paste0('cols', 11:41)