Question

我有一个名为mat.new的数据框。以下是如何生成此数据：

      library(dplyr)

      year <- rep(1980:2015, each = 365) 
      doy <- rep(1:365, times = 36)

      set.seed(125) 
      val <- sample(0:1, size = 365*36,replace = TRUE) 
      mat <- as.matrix(cbind(year,doy,val))
      mat <- as.data.frame(mat)
      mat <- mat %>% 
              mutate(doy1 = rep(1:730, times = 18))
      mat <- mat[,c(1:2,4,3)]

      set.seed(123) 
      mat1 <- apply(matrix(sample(c(230:365), replace = TRUE, size = 2L * 36L), nrow = 36L), 2L, sort)
      mat1 <- t(apply(mat1, 1, function(x) x[order(x)]))
      colnames(mat1) <- c("D1", "D2")
      mat1 <- cbind(year = 1980:2015, mat1)
      mat1 <- as.data.frame(mat1)

      mat1[1:6,3] <- 5:10

      mat1 <- mat1 %>%
                mutate(D2 = ifelse(D1 > D2, D2 + 365, D2))

      mat_new <- mat %>% 
                 left_join(mat1, by = "year")

mat_new有六列。第1列=年份，第2列：doy（每年365天），第3列= doy1，但是从1到730（2年），并从1到730再次重复。第4列有一些值（val），第5列和第6列每年都有一定的开头（D1）和结束的（D2）。如果D2＆gt; 365，这意味着结束日期是明年。对于例如1980年，结束日期是370年，也就是1981年的第5天，

我需要根据各自的开始和结束日期对每年val进行分组。例如1980年，我需要子集的val应该从1980年的233开始到1981年的5月（370是结束日期）。我想到了首先用true和false创建另一个列，然后我可以使用它来转发val

      mat_new1 <- mat_new %>% 
                    mutate(group1 = ifelse(D2 <= 365, doy >= D1 & doy <= D2 , doy >= D1 & doy1 <= D2))

上面的行应创建另一列group1，其中TRUE和FALSE。如果D2＆lt; = 365，即结束日期属于同一年，请使用doy列将D1分组到D2。但是，如果是D2 在下一年（D2> 365），然后使用doy作为开始日期，并从doy1列中取结束日期。上述功能，但是1980年（和其他年份）仅从D1开始为TRUE，但在1980年为365而不是1981年1月5日（doy1为370）

我在这里做错了什么？

Answer 1

这是一个选项。我们的想法是根据D1和D2过滤同一天的数据框，然后过滤下一年的日期。为此，调整D2以计算下一年的天数，因此此方法需要两个查找表。 mat_new3是最终输出。

顺便说一下，有些年份是闰年所以他们有366天。看来你假设所有年份都有365天。只是想确保你知道这一点，这不会影响你的分析。

# Look-up table for the same year
mat_day <- mat_new %>% 
  select(year, D1, D2) %>%
  distinct() %>%
  # Create a column D_next to show how many days are in the next year
  # After that, update D2 to only ended in 365 if D_next > 365
  mutate(D_next = ifelse(D2 > 365, D2 - 365, 0),
         D2 = D2 - D_next)

# Look-up table for the next year
mat_day_next <- mat_day %>%
  # Update the year column to represent the next year
  mutate(year = year + 1) %>%
  # Remove year if it is larger than the maximum of the original year
  filter(year <= max(mat_day$year)) %>%
  # Remove D_next == 0
  filter(D_next != 0) %>%
  # Remove D1 and D2
  select(-D1, -D2) %>%
  # Create a column showing the beginning day of the next year
  mutate(D1 = 1, D2 = D_next)

# Filter rows for the same year  
mat_new1 <- mat_new %>%
  # Join with may_day by year
  left_join(mat_day, by = c("year")) %>%
  group_by(year) %>%
  # Filter by D1.y and D2.y (D1 and D2 from mat_day)
  filter(doy >= D1.y & doy <= D2.y) %>%
  ungroup()

# Filter rows for the next year
mat_new2 <- mat_new %>%
  # Join with may_day_next by year
  left_join(mat_day_next, by = c("year")) %>%
  group_by(year) %>%
  # Filter by D1.y and D2.y (D1 and D2 from mat_day_next)
  filter(doy >= D1.y & doy <= D2.y) %>%
  ungroup()

# Combine the results 
mat_new3 <- bind_rows(mat_new1, mat_new2) %>%
  arrange(year, doy, doy1) %>%
  select(-D1.y, -D2.y, -D_next) %>%
  rename(D1 = D1.x, D2 = D2.x) %>%
  ungroup()

# View the first 6 rows from the year 1980
mat_new3 %>% head()
# # A tibble: 6 x 6
#    year   doy  doy1   val    D1    D2
#   <dbl> <int> <int> <int> <int> <dbl>
# 1  1980   233   233     0   233   370
# 2  1980   234   234     1   233   370
# 3  1980   235   235     0   233   370
# 4  1980   236   236     0   233   370
# 5  1980   237   237     0   233   370
# 6  1980   238   238     1   233   370

# View the last 10 rows from the year 1980
mat_new3 %>%
  slice(1:(370 - 233 + 1)) %>%
  tail(10)
# # A tibble: 10 x 6
#     year   doy  doy1   val    D1    D2
#    <dbl> <int> <int> <int> <int> <dbl>
#  1  1980   361   361     0   233   370
#  2  1980   362   362     1   233   370
#  3  1980   363   363     0   233   370
#  4  1980   364   364     0   233   370
#  5  1980   365   365     1   233   370
#  6  1981     1   366     0   235   371
#  7  1981     2   367     1   235   371
#  8  1981     3   368     0   235   371
#  9  1981     4   369     1   235   371
# 10  1981     5   370     0   235   371

R：sub基于两列设置列

1 个答案: