我有一个像这样的数据表
ID DAYS FREQUENCY
"ads" 20 3
"jwa" 45 2
"mno" 4 1
"ads" 13 3
"jwa" 60 2
"ads" 18 3
我想添加一个列,根据id减去天数,并将最近的天数减去一起。 我的新表是这样的:
ID DAYS FREQUENCY DAYS DIFF
"ads" 20 3 2 (because 20-18)
"jwa" 45 2 NA (because no value greater than 45 for that id)
"mno" 4 1 NA
"ads" 13 3 NA
"jwa" 60 2 15
"ads" 18 3 5
奖励:有没有办法使用合并功能?
答案 0 :(得分:1)
以下是使用dplyr
的答案:
require(dplyr)
mydata %>%
mutate(row.order = row_number()) %>% # row numbers added to preserve original row order
group_by(ID) %>%
arrange(DAYS) %>%
mutate(lag = lag(DAYS)) %>%
mutate(days.diff = DAYS - lag) %>%
ungroup() %>%
arrange(row.order) %>%
select(ID, DAYS, FREQUENCY, days.diff)
输出:
ID DAYS FREQUENCY days.diff
<fctr> <int> <int> <int>
1 ads 20 3 2
2 jwa 45 2 NA
3 mno 4 1 NA
4 ads 13 3 NA
5 jwa 60 2 15
6 ads 18 3 5
答案 1 :(得分:0)
您可以使用dplyr和快速循环执行此操作:
library(dplyr)
# Rowwise data.frame creation because I'm too lazy not to copy-paste the example data
df <- tibble::frame_data(
~ID, ~DAYS, ~FREQUENCY,
"ads", 20, 3,
"jwa", 45, 2,
"mno", 4, 1,
"ads", 13, 3,
"jwa", 60, 2,
"ads", 18, 3
)
# Subtract each number in a numeric vector with the one following it
rolling_subtraction <- function(x) {
out <- vector('numeric', length(x))
for (i in seq_along(out)) {
out[[i]] <- x[i] - x[i + 1] # x[i + 1] is NA if the index is out of bounds
}
out
}
# Arrange data.frame in order of ID / Days and apply rolling subtraction
df %>%
arrange(ID, desc(DAYS)) %>%
group_by(ID) %>%
mutate(days_diff = rolling_subtraction(DAYS))