-Dataset
ID<-c(1,2,3,4,5,6,7)
method<-c("cheque","DD","DD","Cheque","NetBank","NetBank","Cash")
type<-c("Type1","Type1","Type2","Type2","Type3","Type3","Type4")
aid<-c("A1","A1","A2","A2","A3","A3","A4")
month<-c("JAN","JAN","FEB","FEB","MAR","MAR","APR")
year<-c(2016,2016,2015,2015,2017,2017,2018)
Outcome<-c("Positive","Positive","Negative","Negative","Medium","Medium","Neutral")
ser_no<-c("A00001","A00001","A00002","A00002","A00003","A00003","A00004")
Units<-c(100,200,300,400,500,600,700)
amt<-c(1000,1500,2000,3000,4000,2500,6000)
user_cnt<-c(20,20,15,15,32,32,44)
data<-data.frame(ID=ID,type=type,aid=aid,month=month,year=year,Outcome=Outcome,ser_no=ser_no,Units=Units,amt=amt,user_cnt=user_cnt,method=method)
R&GT;数据
ID type aid month year Outcome ser_no Units amt user_cnt method
1 Type1 A1 JAN 2016 Positive A00001 100 1000 20 cheque
2 Type1 A1 JAN 2016 Positive A00001 200 1500 20 DD
3 Type2 A2 FEB 2015 Negative A00002 300 2000 15 DD
4 Type2 A2 FEB 2015 Negative A00002 400 3000 15 Cheque
5 Type3 A3 MAR 2017 Medium A00003 500 4000 32 NetBank
6 Type3 A3 MAR 2017 Medium A00003 600 2500 32 NetBank
7 Type4 A4 APR 2018 Neutral A00004 700 6000 44
Output<-sqldf("select type,aid,month,year,Outcome,ser_no,count(distinct ID) as members,count(type) as entries,sum(UNITS) as UNITS,sum(amt) as amt,
min(amt) as LowestAmt,max(amt) as HighestAmount,AVG(amt) as Mean,user_cnt,cast (count(distinct ID) as real)/user_cnt as Suggestion
from data group by type,aid,month,year,Outcome,ser_no")
R&gt;输出
type aid month year Outcome ser_no members entries UNITS amt LowestAmt HighestAmount Mean user_cnt Suggestion
Type1 A1 JAN 2016 Positive A00001 2 2 300 2500 1000 1500 1250 20 0.10000000
Type2 A2 FEB 2015 Negative A00002 2 2 700 5000 2000 3000 2500 15 0.13333333
Type3 A3 MAR 2017 Medium A00003 2 2 1100 6500 2500 4000 3250 32 0.06250000
Type4 A4 APR 2018 Neutral A00004 1 1 700 6000 6000 6000 6000 44 0.02272727
我想在输出中从data
添加方法列。
方法列只能有四个值1.Cheque 2.DD 3.NetBank 4.Blank表示现金
我想在下面的输出中添加方法列值(检查最后四列)。可以在没有sqldf
的情况下完成。
我试图在组中找到方法值的出现。
示例:根据GROUP BY子句,第一行具有1 Cheque
和1 DD
值,因此计数显示为1
。 Netbank
和Cash
值不存在,因此count为0。
根据GROUP BY子句,第三行包含2 Netbank
个值,因此计数显示为2
,因此没有Netbank
,Cash
和Cheque
值数量是0.
type aid month year Outcome ser_no members entries UNITS amt LowestAmt HighestAmount Mean user_cnt Suggestion Cheque DD Netbank Cash
Type1 A1 JAN 2016 Positive A00001 2 2 300 2500 1000 1500 1250 20 0.10000000 1 1 0 0
Type2 A2 FEB 2015 Negative A00002 2 2 700 5000 2000 3000 2500 15 0.13333333 1 1 0 0
Type3 A3 MAR 2017 Medium A00003 2 2 1100 6500 2500 4000 3250 32 0.06250000 0 0 2 0
Type4 A4 APR 2018 Neutral A00004 1 1 700 6000 6000 6000 6000 44 0.02272727 0 0 0 1
答案 0 :(得分:2)
我无法在&#39;检查&#39;上解决案件问题。因为tolower不能在sqldf下工作。所以包括两个选项。
sqldf("select type
,aid
,month
,year
,Outcome
,ser_no
,count(distinct ID) as members
,count(type) as entries
,sum(UNITS) as UNITS
,sum(amt) as amt
,min(amt) as LowestAmt
,max(amt) as HighestAmount
,AVG(amt) as Mean
,user_cnt
,cast (count(distinct ID) as real)/user_cnt as Suggestion
,count(case when lower(method)='cheque' then method end ) as cheque
,count(case when method ='DD' then method end ) as DD
,count(case when method ='NetBank' then method end ) as NetBank
,count(case when method ='Cash' then method end ) as Cash
from data
group by type,aid,month,year,Outcome,ser_no")
答案 1 :(得分:1)
数据表:
library(data.table)
DT <- setDT(data)
DT[,method := tolower(method)] # to avoid different count with upper and lower case
plouf<-dcast(DT[,.N, by = .(type,method)],type~ method)
plouf[is.na(plouf)]<-0
type cash cheque dd netbank
1: Type1 0 1 1 0
2: Type2 0 1 1 0
3: Type3 0 0 0 2
4: Type4 1 0 0 0
此处DT[,.N, by = .(type,method)]
计算不同的方法,并将dcasdt转换为大格式。
然后,您可以与输出合并
Output <- setDT(Output)
Output[plouf, on = "type"]
type aid month year Outcome ser_no members entries UNITS amt LowestAmt HighestAmount Mean user_cnt Suggestion cash
1: Type1 A1 JAN 2016 Positive A00001 2 2 300 2500 1000 1500 1250 20 0.10000000 0
2: Type2 A2 FEB 2015 Negative A00002 2 2 700 5000 2000 3000 2500 15 0.13333333 0
3: Type3 A3 MAR 2017 Medium A00003 2 2 1100 6500 2500 4000 3250 32 0.06250000 0
4: Type4 A4 APR 2018 Neutral A00004 1 1 700 6000 6000 6000 6000 44 0.02272727 1
cheque dd netbank
1: 1 1 0
2: 1 1 0
3: 0 0 2
4: 0 0 0
答案 2 :(得分:1)
整个聚合可以使用data.table
:
library(data.table)
setDT(data)[
, .(members = uniqueN(ID), entries = .N, UNITS = sum(Units), amt = sum(amt),
LowestAmt = min(amt), HighestAmount = max(amt), Mean = mean(amt),
user_cnt = first(user_cnt), Suggestion = uniqueN(ID) / first(user_cnt),
Cheque = sum(tolower(method) == "cheque"), DD = sum(tolower(method) == "dd"),
NetBank = sum(tolower(method) == "netbank"),
Cash = sum(tolower(method) %in% c("cash", ""))),
by = .(type, aid, month, year, Outcome, ser_no)]
type aid month year Outcome ser_no members entries UNITS amt LowestAmt HighestAmount Mean user_cnt Suggestion Cheque DD NetBank Cash 1: Type1 A1 JAN 2016 Positive A00001 2 2 300 2500 1000 1500 1250 20 0.10000000 1 1 0 0 2: Type2 A2 FEB 2015 Negative A00002 2 2 700 5000 2000 3000 2500 15 0.13333333 1 1 0 0 3: Type3 A3 MAR 2017 Medium A00003 2 2 1100 6500 2500 4000 3250 32 0.06250000 0 0 2 0 4: Type4 A4 APR 2018 Neutral A00004 1 1 700 6000 6000 6000 6000 44 0.02272727 0 0 0 1
如果method
中有超过4个不同的值,我会建议其他方法,例如dcast()
和加入。