我想计算 up
列中 1 后出现 1 的次数、1 后出现 0 的次数、两次 1 后出现 1 的次数,以此类推。
这是我的数据集:
library(lubridate)
set.seed(321)
df <- data.frame(seq(ymd_h("2017-01-01-00"), ymd_h("2020-01-31-24"), by = "hours"))
df$close <- rnorm(nrow(df), 3000, 150)
df$up <- ifelse(sign(rnorm(27025))==-1,0,1)
colnames(df) <- c("date", "close", "up")
df$date <- as.POSIXct(df$date, format = "%Y-%m-%d %H:%M:%S")
df$hour <- hour(df$date)
df$day <- day(df$date)
df$month <- month(df$date)
df$year <- year(df$date)`
我想统计up
列中1后面出现1的次数,1后面出现0的次数,两个1后面出现1的次数连续,依此类推。
如何使用此数据集执行此操作?
答案 0 :(得分:2)
将通过检查当前值为 1 和以前的值为 0 所创建的复合逻辑表达式的 sum
创建一个 count_01,然后在 'up' 上按 rleid
进行分组并包含 'count_01 ', summarise
返回 'count_11', 'count_111' 的逻辑列,其中 TRUE 仅当行数分别为 2 或 3 且 'up' 中的 all
值为 1,然后取sum
列返回计数
library(dplyr)
library(data.table)
df %>%
mutate(count_01 = sum(up == 1 & lag(up == 0))) %>%
group_by(group = rleid(up), count_01) %>%
summarise(count_11 = n() ==2 & all(up == 1),
count_111 = n() == 3 & all(up == 1), .groups = 'drop') %>%
summarise(count_01 = first(count_01), count_11 = sum(count_11),
count_111 = sum(count_111))
# A tibble: 1 x 3
# count_01 count_11 count_111
# <int> <int> <int>
#1 6657 1722 794
或者使用 base R
with(df, sum(up[-1] == 1 & up[-length(up)] == 0))
#[1] 6657
rl <- rle(df$up == 1)
sum(rl$lengths == 3 & rl$values)
#[1] 794
sum(rl$lengths == 2 & rl$values)
#[1] 1722