我想在for循环中优化此代码以获得更大的数据集。
library(reshape2)
Customer<- c("Susan","Louis", "Frank","Susan")
Seller<- c("Ivan", "Donald","Chris","Ivan")
Service<-c("COU","CAR", "FCL","CAR")
Billingmean<- c(100,200,300,400)
WrsHoldSum<-c(0,0,0,0)
Group<- c("n1","n2"," "," ")
B1<- c(0,2,2,1)
B2<-c(9,8,7,6)
B3<- c(5,4,3,2)
df<- data.frame(Customer, Seller,Service, Billingmean,WrsHoldSum, Group,B1,B2,B3)
sub1<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "Billingmean")
sub2<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B1")
sub3<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B2")
sub4<- dcast(data= df, formula= Customer+Group+Seller+WrsHoldSum~Service,fun.aggregate= sum,value.var= "B3")
finaldf<- merge (sub1,sub2, sub3, sub4,by=c("Customer","Group","Seller","WrsHoldSum"))
答案 0 :(得分:0)
由于您还想知道by
中不属于merge
参数的列的来源,您可以使用Reduce()
在lapply
之前命名这些列这将方便地输出一个列表并缩短你的Reduce()
声明:
确定lapply()
的表格名称:
tNames <- grep(x = ls(), pattern = "^sub", value = T)
然后将lapply()
与自定义函数一起使用,该函数将输出具有相关列的表格。名称已修改。使用管道%>%
以方便Reduce()
列表merge()
:
lapply(seq_along(tNames), function(x){
tSym <- as.name(tNames[[x]])
d1 <- copy(eval(tSym))
cols <- grep(x = names(d1), pattern = "^CAR|^COU|^FCL", value = T)
setnames(d1, old = cols, new = paste0(cols, " B", x))
return(d1)
}) %>% Reduce(function(x, y) merge(x, y, by = c("Customer","Group","Seller","WrsHoldSum")), .)
结果:
Customer Group Seller WrsHoldSum CAR B1 COU B1 FCL B1 CAR B2 COU B2 FCL B2 CAR B3 COU B3 FCL B3 CAR B4 COU B4 FCL B4
1 Frank Chris 0 0 0 300 0 0 2 0 0 7 0 0 3
2 Louis n2 Donald 0 200 0 0 2 0 0 8 0 0 4 0 0
3 Susan Ivan 0 400 0 0 1 0 0 6 0 0 2 0 0
4 Susan n1 Ivan 0 0 100 0 0 0 0 0 9 0 0 5 0