我想添加一列来计算连续值的数量。我在这里看到的大部分内容是如何计算重复值(1,1,1,1,1),当数字增加1(5,6,7,8,9)时,我想计算一个。我要创建ID列,计数器列是我要创建的列。谢谢!
ID Counter
5 1
6 2
7 3
8 4
10 1
11 2
13 1
14 2
15 3
16 4
答案 0 :(得分:1)
循环版本很简单:
for (i in 2:length(ID))
if (diff(ID)[i-1] == 1)
counter[i] <- counter[i-1] +1
else
counter[i] <- 1
但是对于n> 10 ^ 4,此循环将表现非常糟糕!我将尝试考虑向量解!
答案 1 :(得分:1)
使用dplyr
软件包的解决方案。这个想法是计算每个数字之间的差以创建一个分组列,然后将计数器分配给每个组。
library(dplyr)
dat2 <- dat %>%
mutate(Diff = ID - lag(ID, default = 0),
Group = cumsum(Diff != 1)) %>%
group_by(Group) %>%
mutate(Counter = row_number()) %>%
ungroup() %>%
select(-Diff, -Group)
dat2
# # A tibble: 10 x 2
# ID Counter
# <int> <int>
# 1 5 1
# 2 6 2
# 3 7 3
# 4 8 4
# 5 10 1
# 6 11 2
# 7 13 1
# 8 14 2
# 9 15 3
# 10 16 4
数据
dat <- read.table(text = "ID
5
6
7
8
10
11
13
14
15
16",
header = TRUE, stringsAsFactors = FALSE)
答案 2 :(得分:1)
您可以使用
s=df$ID-shift(df$ID)
s[is.na(s)]=1
ave(s,cumsum(s!=1),FUN=seq_along)
[1] 1 2 3 4 1 2 1 2 3 4
答案 3 :(得分:1)
该函数仅使用高效的矢量算法。想法如下:
1。取ID差异的累积总和
2。如果跳转大于1,则减去该值
cum <- c(0, cumsum(diff(ID))) # take the cumulative difference of ID
ccm <- cum * c(1, (diff(ID) > 1)) # those with jump > 1 will remain its value
# subtract value with jump > 1 for all following numbers (see Link for reference)
# note: rep(0, n) is because ccm[...] starts at first non null value
counter <- cum - c(rep(0, which(diff(dat) != 1)[1]),
ccm[which(ccm != 0)][cumsum(ccm != 0)]) + 1
enter code here
注释:
nacnudus对高效填充函数的参考:Fill in data frame with values from rows above
限制:ID必须单调递增
那应该可以有效地处理您数百万的数据!
答案 4 :(得分:0)
另一种解决方案:
breaks <- c(which(diff(ID)!=1), length(ID))
x <- c(breaks[1], diff(breaks))
unlist(sapply(x, seq_len))