我有一个数据框。
dat <- read.table(text = "
YEAR MONTH DAY PCP SPELL
1950 12 28 0 DRY
1950 12 29 11.7 WET
1950 12 30 0 DRY
1950 12 31 0 DRY
1951 01 01 0 DRY
1951 01 02 0 DRY
1951 01 03 20.3 WET
", header = TRUE)
我按年份和月份创建组,
library(tidyverse)
groups <- dat %>% group_by(YEAR , MONTH) %>% summarise(NUM = n())
groups$ID <- 1:length(grupos$NUM)
dat %>% left_join(groups, by = c("YEAR", "MONTH"))
并应用脚本,
dfx <- data.frame(dat, svalue = NA)
dfx$svalue[1] <- ifelse(dfx$SPELL[1] == "DRY", 1, 0)
for(i in 2:nrow(dfx))
dfx$svalue[i] <- ifelse(dfx$SPELL[i] == 0, dfx$svalue[i - 1] + 1, 0)
然后,我得到:
YEAR MONTH DAY PCP SPELL svalue
1950 12 28 0 DRY 1
1950 12 29 11.7 WET 0
1950 12 30 0 DRY 1
1950 12 31 0 DRY 2
1951 01 01 0 DRY 3
1951 01 02 0 DRY 4
1951 01 03 20.3 WET 0
如何区分年份和月份的值?
我需要获得这个:
YEAR MONTH DAY PCP SPELL svalue
1950 12 28 0 DRY 1
1950 12 29 11.7 WET 0
1950 12 30 0 DRY 1
1950 12 31 0 DRY 2
1951 01 01 0 DRY 1
1951 01 02 0 DRY 2
1951 01 03 20.3 WET 0
或应用dw.spell
包中的RMRAINGEN
,并以年份月份分隔。
谢谢。
答案 0 :(得分:0)
基于预期的输出,可以通过使用在“值”上创建的逻辑向量创建另一个组来创建它。
library(data.table)
setDT(dfx)[svalue != 0, svalue := seq_len(.N), .(cumsum(svalue == 1), YEAR, MONTH)]
dfx
# YEAR MONTH DAY PCP SPELL svalue
#1: 1950 12 28 0.0 DRY 1
#2: 1950 12 29 11.7 WET 0
#3: 1950 12 30 0.0 DRY 1
#4: 1950 12 31 0.0 DRY 2
#5: 1951 1 1 0.0 DRY 1
#6: 1951 1 2 0.0 DRY 2
#7: 1951 1 3 20.3 WET 0
或按“ SPELL”的run-length-id
分组
setDT(dfx)[, svalue := seq_len(.N) * (svalue != 0), .(rleid(SPELL), YEAR, MONTH)]
dfx <- structure(list(YEAR = c(1950L, 1950L, 1950L, 1950L, 1951L, 1951L,
1951L), MONTH = c(12L, 12L, 12L, 12L, 1L, 1L, 1L), DAY = c(28L,
29L, 30L, 31L, 1L, 2L, 3L), PCP = c(0, 11.7, 0, 0, 0, 0, 20.3
), SPELL = c("DRY", "WET", "DRY", "DRY", "DRY", "DRY", "WET"),
svalue = c(1L, 0L, 1L, 2L, 3L, 4L, 0L)), class = "data.frame",
row.names = c(NA, -7L))