我对R很陌生,在解决问题上有些挣扎。
我需要从我的数据集中进行随机排列,我有5个类别(这些年龄类别)分别为10、16、21、26、36。这5个类别被安排到各组的视线中,例如, (约2000个组):
10,10,16,21
16,16,16
36
21
21,26
21,10
10,10,16
16
21
26, 16
16,16,16,16,21,16,10
16,21,16
26
我需要对这些组进行随机排列。保持每个年龄段的值数量相同(10、16、21、26、36),保持每个组大小的数量相同(例如,在上面的示例中,仍然只有5个组,只有1个成员,3个组,每个3个成员)
我非常感谢您的帮助。
答案 0 :(得分:0)
这里是读取数据,生成一个随机排列并将其保存到文件的一种方法。
library("tidyverse")
input <- "10,10,16,21
16,16,16
36
21
21,26
21,10
10,10,16
16
21
26,16
16,16,16,16,21,16,10
16,21,16
26
"
read_data <- function(input) {
data <-
read_lines(input) %>% # Or read them from a text file with `read_lines(filename)`
enframe("group", "category") %>% # Convert to a data frame with two columns:
# "name": the group ID, which is the same as the row number
# "value": the sequence of sightings as a string
mutate(category = str_split(category, ",")) %>% # Split "10,16,26" into c("10","16","26"), etc.
unnest() %>% # "Unfold" the data frame so that each row is a group ID and an element
# e.g., c("10","16","26") will unnest into three rows.
mutate(category = as.integer(category)) # Convert "10" into 10, might not be strictly necessary
data
}
permute_data <- function(data) {
data %>%
mutate(shuffled = sample(category, n(), replace = FALSE)) %>% # Keep the IDs ordered,
# shuffle the observations.
group_by(group) %>% # Now a group originally with 3 elements is
summarise(shuffled = paste(shuffled, collapse = ",")) # matched with 3 randomly selected elements.
# Join those together in the input format.
}
data <- read_data(input)
data2 <- permute_data(data) # Permute the data, once or multiple times
# The result is a data frame with the original groups in the "name" column,
# and the shuffled groups in the "shuffled" column
data2
data2$shuffled %>% write_lines("shuffled.txt") # Save just the shuffled column to a new file.