我有一个数据框testData
,它由许多独特的ID组成。我的目标是确定ID是否包含month
,yday
和week
换句话说,如果id
包含month
范围内的所有可能值,那么它应该会收到t
。如果id
包含yday
范围内的所有可能值,则会收到t
,如果id
包含week
范围内的所有可能值,它应该收到t
。否则,它应该收到f
数据样本如下所示:
> testData
id month yday week
1 1 1 1 1
2 3 1 2 1
3 4 1 3 1
4 2 1 4 1
5 3 3 5 1
6 4 1 6 1
7 2 1 7 1
8 3 1 8 2
9 1 1 9 2
10 5 1 10 2
11 3 2 11 1
12 4 1 12 1
13 5 1 13 1
14 1 1 14 1
输出应该如下所示:
> output
id month yday week
1 1 f f t
2 2 f f f
3 3 t f t
4 4 f f f
5 5 f f t
我知道可以检查数字是否在findInterval()
的某个范围内,但是有人可以建议一种方法来检查向量中的数字是否包含范围内的所有整数吗?
> dput(testData)
structure(list(id = c(1L, 3L, 4L, 2L, 3L, 4L, 2L, 3L, 1L, 5L,
3L, 4L, 5L, 1L), month = c(1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L), yday = 1:14, week = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L)), .Names = c("id", "month",
"yday", "week"), class = "data.frame", row.names = c(NA, -14L
))
答案 0 :(得分:0)
轻松data.table
library(data.table)
setDT(testdata)
output<-testdata[,.(month=all(unique(testdata$month)%in%.SD$month),yday=all(unique(testdata$yday)%in%.SD$yday),Week=all(unique(testdata$week)%in%.SD$week)),by=(id)]
output
id month yday Week
1: 1 FALSE FALSE TRUE
2: 2 FALSE FALSE FALSE
3: 3 TRUE FALSE TRUE
4: 4 FALSE FALSE FALSE
5: 5 FALSE FALSE TRUE
答案 1 :(得分:0)
以下是使用dplyr
:
library(dplyr)
testData_copy <-testData
testData %>%
group_by(id) %>%
summarise(month=n_distinct(month)== n_distinct(testData_copy$month),
yday =n_distinct(yday) == n_distinct(testData_copy$yday),
week =n_distinct(week) == n_distinct(testData_copy$week)
)
# A tibble: 5 × 4
id month yday week
<int> <lgl> <lgl> <lgl>
1 1 FALSE FALSE TRUE
2 2 FALSE FALSE FALSE
3 3 TRUE FALSE TRUE
4 4 FALSE FALSE FALSE
5 5 FALSE FALSE TRUE