我有一个数据框,其中包含一个“分组”变量season
和另一个变量year
,每个月重复一次。
df <- data.frame(month = as.character(sapply(month.name,function(x)rep(x,4))),
season = c(rep("winter",8),rep("spring",12),rep("summer",12),rep("autumn",12),rep("winter",4)),
year = rep(2021:2024,12))
我想使用dplyr::sample_n
或类似的方法在每个季节的数据框中选择2个月,并在所有年份中保持相同的月份,例如:
month season year
1 January winter 2021
2 January winter 2022
3 January winter 2023
4 January winter 2024
5 February winter 2021
6 February winter 2022
7 February winter 2023
8 February winter 2024
9 March spring 2021
10 March spring 2022
11 March spring 2023
12 March spring 2024
13 May spring 2021
14 May spring 2022
15 May spring 2023
16 May spring 2024
17 June summer 2021
18 June summer 2022
19 June summer 2023
20 June summer 2024
21 July summer 2021
22 July summer 2022
23 July summer 2023
24 July summer 2024
25 October autumn 2021
26 October autumn 2022
27 October autumn 2023
28 October autumn 2024
29 November autumn 2021
30 November autumn 2022
31 November autumn 2023
32 November autumn 2024
我无法创建df %>% group_by(season,year) %>% sample_n(2)
,因为它每年选择不同的月份。
谢谢!
答案 0 :(得分:3)
我们可以按组从sample
和month
中随机filter
2个值。
library(dplyr)
df %>%
group_by(season) %>%
filter(month %in% sample(unique(month),2))
# month season year
# <chr> <chr> <int>
# 1 January winter 2021
# 2 January winter 2022
# 3 January winter 2023
# 4 January winter 2024
# 5 February winter 2021
# 6 February winter 2022
# 7 February winter 2023
# 8 February winter 2024
# 9 March spring 2021
#10 March spring 2022
# … with 22 more rows
如果对于某些组,少于2个unique
的值,我们可以选择min
,其中2个为sample
组中的唯一值。
df %>%
group_by(season) %>%
filter(month %in% sample(unique(month),min(2, n_distinct(month))))
使用与基数R相同的逻辑,我们可以使用ave
df[as.logical(with(df, ave(month, season,
FUN = function(x) x %in% sample(unique(x),2)))), ]
答案 1 :(得分:0)
使用map_array_util(std::map<A,B>& _m) : m(&_m) {}
slice
或使用library(dplyr)
df %>%
group_by(season) %>%
slice(which(!is.na(match(month, sample(unique(month), 2)))))
# A tibble: 32 x 3
# Groups: season [4]
# month season year
# <fct> <fct> <int>
# 1 October autumn 2021
# 2 October autumn 2022
# 3 October autumn 2023
# 4 October autumn 2024
# 5 November autumn 2021
# 6 November autumn 2022
# 7 November autumn 2023
# 8 November autumn 2024
# 9 April spring 2021
#10 April spring 2022
# … with 22 more rows
base R