我有一个类似于下面的数据框df
- 其中有几个月内得分/丢失的分数。
name month agg_points
A 2017-04-01 1
B 2017-04-01 3
C 2017-04-01 0
A 2017-05-01 2
B 2017-05-01 5
C 2017-05-01 2
A 2017-06-01 4
B 2017-06-01 5
C 2017-06-01 1
我需要找到每月获得/失去的净点数 - 这意味着从当前月份的点数减去前一个月的点数。如何访问df
中的前一个月的点?
预期输出
name month net_points
A 2017-04-01 1
B 2017-04-01 3
C 2017-04-01 0
A 2017-05-01 1
B 2017-05-01 2
C 2017-05-01 2
A 2017-06-01 2
B 2017-06-01 0
C 2017-06-01 -1
答案 0 :(得分:3)
使用dplyr
,您可以在lag
和group
行之后使用arrange
函数:
library(dplyr)
df %>%
group_by(name) %>%
arrange(month, .by_group = TRUE) %>%
mutate(net_points = agg_points - lag(agg_points, default = 0)) %>%
arrange(month)
#> # A tibble: 9 x 4
#> # Groups: name [3]
#> name month agg_points net_points
#> <chr> <chr> <int> <int>
#> 1 A 2017-04-01 1 1
#> 2 B 2017-04-01 3 3
#> 3 C 2017-04-01 0 0
#> 4 A 2017-05-01 2 1
#> 5 B 2017-05-01 5 2
#> 6 C 2017-05-01 2 2
#> 7 A 2017-06-01 4 2
#> 8 B 2017-06-01 5 0
#> 9 C 2017-06-01 1 -1
数据强>
df <- read.table(text = "name month agg_points
A 2017-04-01 1
B 2017-04-01 3
C 2017-04-01 0
A 2017-05-01 2
B 2017-05-01 5
C 2017-05-01 2
A 2017-06-01 4
B 2017-06-01 5
C 2017-06-01 1", header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
一种方式:
with(df, {
x <- xtabs(agg_points ~ month + name)
x[-1, ] <- diff(x)
as.data.frame(x, responseName = 'net_points')
})
# month name net_points
#1 2017-04-01 A 1
#2 2017-05-01 A 1
#3 2017-06-01 A 2
#4 2017-04-01 B 3
#5 2017-05-01 B 2
#6 2017-06-01 B 0
#7 2017-04-01 C 0
#8 2017-05-01 C 2
#9 2017-06-01 C -1
答案 2 :(得分:1)
您可以创建新的临时变量 lag 并使用X3 - lag
来获取 net_points 。
library(readr)
df <- read_csv(
"A,2017-04-01,1
B,2017-04-01,3
C,2017-04-01,0
A,2017-05-01,2
B,2017-05-01,5
C,2017-05-01,2
A,2017-06-01,4
B,2017-06-01,5
C,2017-06-01,1",
col_names = F
)
str(df)
library(dplyr)
df %>% group_by(X1) %>% mutate(lag = lag(X3), diff = ifelse(!is.na(lag), X3 - lag, X3)) %>%
select(-lag)
给出
X1 X2 X3 diff
<chr> <date> <int> <int>
1 A 2017-04-01 1 1
2 B 2017-04-01 3 3
3 C 2017-04-01 0 0
4 A 2017-05-01 2 1
5 B 2017-05-01 5 2
6 C 2017-05-01 2 2
7 A 2017-06-01 4 2
8 B 2017-06-01 5 0
9 C 2017-06-01 1 -1
答案 3 :(得分:0)
我有data.table相当于标记答案:
library(data.table)
DT <- setDT(df)
setkey(DT,month)
x <- DT[, list(netpoint = diff(agg_points), month = .SD[-1,month]),by = name]
x是具有差值的data_table。然后我们合并x和DT
DT <- x[DT, on = .(name,month)][,c("name","month","agg_points","netpoint")]
并添加netpoint的第一个值(等于agg_points)
DT[,netpoint :={netpoint[1]<-agg_points[1]; netpoint},by=name]
哪个给出了
name month agg_points netpoint
1: A 2017-04-01 1 1
2: B 2017-04-01 3 3
3: C 2017-04-01 0 0
4: A 2017-05-01 2 1
5: B 2017-05-01 5 2
6: C 2017-05-01 2 2
7: A 2017-06-01 4 2
8: B 2017-06-01 5 0
9: C 2017-06-01 1 -1
一个更接近标记答案的方法是:
DT <- setDT(df)
setkey(DT,month)
DT[,netpoint := agg_points - c(NA, agg_points[-.N]), by = name]
但我仍然需要做
DT[,netpoint :={netpoint[1]<-agg_points[1]; netpoint},by=name]
填写第一行,让我失望。谁有更好的方法?