我有一张包含客户名称,付款月份和支出金额的表格,如下所示:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 401 Authentication failed. Browser based integrations - to login append '?login-form-required=y' to the url you tried to access.</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /qcbin/rest/domains/Projects/projects/Newgen/defects/. Reason:
<pre> Authentication failed. Browser based integrations - to login append '?login-form-required=y' to the url you tried to access.</pre></p><hr><i><small>Powered by Jetty://</small></i><hr/>
</body>
</html>
我想计算每个客户的支出月度变化(mom_change)和月度百分比变化(mom_per_change)。期望的输出是 -
c_name p_month spend
ABC 201401 100
ABC 201402 150
ABC 201403 230
DEF 201401 110
DEF 201402 190
DEF 201403 300
我尝试使用c_name p_month spend mom_change mom_per_change
ABC 201401 100 Blank Blank
ABC 201402 150 50 0.5
ABC 201403 230 80 0.533
DEF 201401 110 Blank Blank
DEF 201402 190 80 0.727
DEF 201403 300 110 0.578
分别计算每个客户端的更改。问题是大约有10000个客户端,使用循环计算它需要花费大量时间。任何帮助深表感谢。感谢。
答案 0 :(得分:1)
这可以使用data.table
和shift()
dt<-data.table(c_name=c("ABC","ABC","ABC","DEF","DEF","DEF"),
pmonth=c(201401,201402,201403,201401,201402,201403),
spend=c(100,150,230,110,190,300))
dt[, mom_change := (spend-shift(spend)), by=c_name]
dt[, mom_per_change := (spend-shift(spend))/shift(spend), by=c_name]
dt
c_name pmonth spend mom_change mom_per_change
1: ABC 201401 100 NA NA
2: ABC 201402 150 50 0.5000000
3: ABC 201403 230 80 0.5333333
4: DEF 201401 110 NA NA
5: DEF 201402 190 80 0.7272727
6: DEF 201403 300 110 0.5789474
答案 1 :(得分:1)
以下是使用data.table
的解决方案,blank
替换为NA
:
library(data.table)
setDT(df)[, `:=` (mom_change = c(NA, diff(spend)),
mom_per_change = round(c(NA, diff(spend))/shift(spend), 3)), .(c_name)]
df
c_name p_month spend mom_change mom_per_change
1: ABC 201401 100 NA NA
2: ABC 201402 150 50 0.500
3: ABC 201403 230 80 0.533
4: DEF 201401 110 NA NA
5: DEF 201402 190 80 0.727
6: DEF 201403 300 110 0.579
答案 2 :(得分:0)
dplyr
方法,
library(dplyr)
df %>%
group_by(c_name) %>%
mutate(mom_change = c(NA, diff(spend)), mom_per_change = (spend - lag(spend))/lag(spend))
#Source: local data frame [6 x 5]
#Groups: c_name [2]
# c_name p_month spend mom_change mom_per_change
# (fctr) (int) (int) (dbl) (dbl)
#1 ABC 201401 100 NA NA
#2 ABC 201402 150 50 0.5000000
#3 ABC 201403 230 80 0.5333333
#4 DEF 201401 110 NA NA
#5 DEF 201402 190 80 0.7272727
#6 DEF 201403 300 110 0.5789474