
时间:2017-12-11 07:30:18

标签: r





  ID  type aid month year  Outcome ser_no Units  amt user_cnt  method
  1 Type1  A1   JAN 2016 Positive A00001   100 1000       20  cheque
  2 Type1  A1   JAN 2016 Positive A00001   200 1500       20      DD
  3 Type2  A2   FEB 2015 Negative A00002   300 2000       15      DD
  4 Type2  A2   FEB 2015 Negative A00002   400 3000       15  Cheque
  5 Type3  A3   MAR 2017   Medium A00003   500 4000       32 NetBank
  6 Type3  A3   MAR 2017   Medium A00003   600 2500       32 NetBank
  7 Type4  A4   APR 2018  Neutral A00004   700 6000       44     

Output<-sqldf("select type,aid,month,year,Outcome,ser_no,count(distinct ID) as members,count(type) as entries,sum(UNITS) as UNITS,sum(amt) as amt,
min(amt) as LowestAmt,max(amt) as HighestAmount,AVG(amt) as Mean,user_cnt,cast (count(distinct ID) as real)/user_cnt as Suggestion 
from data group by type,aid,month,year,Outcome,ser_no")


 type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion  
 Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000     
 Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333 
 Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000 
 Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727 
  • 我想在输出中从data 添加方法列

    方法列只能有四个值1.Cheque 2.DD 3.NetBank 4.Blank表示现金

我想在下面的输出中添加方法列值(检查最后四列)。可以在没有sqldf的情况下完成。 我试图在组中找到方法值的出现。

示例:根据GROUP BY子句,第一行具有1 Cheque1 DD值,因此计数显示为1NetbankCash值不存在,因此count为0。 根据GROUP BY子句,第三行包含2 Netbank个值,因此计数显示为2,因此没有NetbankCashCheque值数量是0.

 type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion    Cheque      DD  Netbank     Cash
 Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000      1         1      0         0
 Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333      1         1      0         0
 Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000      0         0      2         0
 Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727      0         0      0         1

3 个答案:

答案 0 :(得分:2)


sqldf("select type
,count(distinct ID) as members
,count(type) as entries
,sum(UNITS) as UNITS
,sum(amt) as amt
,min(amt) as LowestAmt
,max(amt) as HighestAmount
,AVG(amt) as Mean
,cast (count(distinct ID) as real)/user_cnt as Suggestion
,count(case when lower(method)='cheque' then method  end ) as cheque 
,count(case when method ='DD' then method  end ) as DD 
,count(case when method ='NetBank' then method  end ) as NetBank 
,count(case when method ='Cash' then method  end ) as Cash 
from data
group by type,aid,month,year,Outcome,ser_no")

答案 1 :(得分:1)


DT <- setDT(data)
DT[,method := tolower(method)] # to avoid different count with upper and lower case
plouf<-dcast(DT[,.N, by = .(type,method)],type~ method)

    type cash cheque dd netbank
1: Type1   0     1    1      0
2: Type2   0     1    1      0
3: Type3   0     0    0      2
4: Type4   1     0    0      0

此处DT[,.N, by = .(type,method)]计算不同的方法,并将dcasdt转换为大格式。 然后,您可以与输出合并

Output <- setDT(Output)
Output[plouf, on = "type"]

    type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion cash
1: Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000    0
2: Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333    0
3: Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000    0
4: Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727    1
   cheque dd netbank
1:      1  1      0
2:      1  1      0
3:      0  0      2
4:      0  0      0

答案 2 :(得分:1)


  , .(members = uniqueN(ID), entries = .N, UNITS = sum(Units), amt = sum(amt), 
      LowestAmt = min(amt), HighestAmount = max(amt), Mean = mean(amt), 
      user_cnt = first(user_cnt), Suggestion =  uniqueN(ID) / first(user_cnt),
      Cheque = sum(tolower(method) == "cheque"), DD = sum(tolower(method) == "dd"), 
      NetBank = sum(tolower(method) == "netbank"), 
      Cash = sum(tolower(method) %in% c("cash", ""))), 
  by = .(type, aid, month, year, Outcome, ser_no)]
    type aid month year  Outcome ser_no members entries UNITS  amt LowestAmt HighestAmount Mean user_cnt Suggestion Cheque DD NetBank Cash
1: Type1  A1   JAN 2016 Positive A00001       2       2   300 2500      1000          1500 1250       20 0.10000000      1  1       0    0
2: Type2  A2   FEB 2015 Negative A00002       2       2   700 5000      2000          3000 2500       15 0.13333333      1  1       0    0
3: Type3  A3   MAR 2017   Medium A00003       2       2  1100 6500      2500          4000 3250       32 0.06250000      0  0       2    0
4: Type4  A4   APR 2018  Neutral A00004       1       1   700 6000      6000          6000 6000       44 0.02272727      0  0       0    1
