我正在寻找帮助编写一个函数,该函数可以在数据集中给定客户的值中识别趋势("正/负/混合",参见下面的定义)。
我有以下交易数据;所有客户各有3-13笔交易。
customer_ID transaction_num sales
Josh 1 $35
Josh 2 $50
Josh 3 $65
Ray 1 $65
Ray 2 $52
Ray 3 $49
Ray 4 $15
Eric 1 $10
Eric 2 $13
Eric 3 $9
我想在R中编写一个函数来填充新的数据帧,如下所示
Customer_ID Sales_Slope
Josh Positive
Ray Negative
Eric Mixed
其中:
Josh的斜率正面,因为他的所有交易销售成本随着每个额外的购物点而继续增加
Ray的斜率为负,因为他的所有交易销售成本随着每个额外的购物点而继续下降
Eric的斜率混合,因为他的所有交易销售成本都在波动......没有明显的趋势......
我已经尝试过非常广泛地做到这一点,但我被卡住了..这里有一些伪代码,我能够把它放在一起
counter = max(transaction_num)
while counter >= 0
if sales at max transaction_num are greater than sales at max transaction_num - 1)
then counter = counter - 1 ; else "not positive slope trend"
答案 0 :(得分:2)
简单的答案是使用diff
。它只是从下一个中减去当前值,因此如果diff(x)
的所有值都高于零,则它会增加,反之亦然。首先,阅读数据:
# Read in some data.
data<-read.table(textConnection('customer_ID transaction_num sales
Josh 1 $35
Josh 2 $50
Josh 3 $65
Ray 1 $65
Ray 2 $52
Ray 3 $49
Ray 4 $15
Eric 1 $10
Eric 2 $13
Eric 3 $9'),header=TRUE,stringsAsFactors=FALSE)
data$sales<-as.numeric(sub('\\$','',data$sales))
现在代码:
# Diff subtracts next value from current in a diff.
# so diff(c(1,2,3,4)) is c(1,1,1)
direction<-function(x){
if(all(diff(x)>0)) return('Increasing')
if(all(diff(x)<0)) return('Decreasing')
return('Mixed')
}
# If you want a vector.
c(by(data$sales,data$customer_ID,direction))
# Eric Josh Ray
# "Mixed" "Increasing" "Decreasing"
# If you want to a little data frame.
aggregate(sales~customer_ID,data,direction)
# customer_ID sales
# 1 Eric Mixed
# 2 Josh Increasing
# 3 Ray Decreasing
答案 1 :(得分:2)
我想我会从这样的事情开始。对于更大的数据集,data.table
通常非常有效。
#Make fake data
require("data.table")
data <- data.table(customer_ID=c(rep("Josh",3),rep("Ray",4),rep("Eric",3)),
sales=c(35,50,65,65,52,49,15,10,13,9))
data[,transaction_num:=seq(1,.N),by=c("customer_ID")]
现在是实际的代码。
data <- data.table(data)
#Calculate difference in rolling two time periods
rolled.up <- data[,list(N.Minus.1=.N-1,Change=list(
sales[transaction_num+1]-sales[transaction_num])),
by=c("customer_ID")]
#Sum up positive and negative values
rolled.up[,Pos.Values:=as.numeric(lapply(Change,FUN=function(x) {sum(1*(x>0),na.rm=T)}))]
rolled.up[,Neg.Values:=(N.Minus.1-Pos.Values)]
#Make Sales Slope variable
rolled.up[,Sales_Slope:=ifelse(Pos.Values>0 & Neg.Values==0,"Positive",
ifelse(Pos.Values==0 & Neg.Values>0,"Negative","Mixed"))]
#Make final table
final.table <- rolled.up[,list(customer_ID,Sales_Slope)]
final.table
# customer_ID Sales_Slope
# 1: Josh Positive
# 2: Ray Negative
# 3: Eric Mixed
#You can always merge this result back onto your main dataset if you want
data <- merge(x=data,y=final.table,by=c("customer_ID"),all.x=T)