Question

我需要根据数据帧的随机化生成并保存多个文件。原始数据帧是几年的每日天气数据。我需要生成这些年份的随机重组文件，但要保持年份顺序。

我已经为随机化的年份开发了一个简单的代码，但是我很难重复随机化并将每个输出的随机数据帧保存为单独的文件。

这是我到目前为止所拥有的：

# Create example data frame
df <- data.frame(x=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,8,8))
df$y <- c(4,8,9,1,1,5,8,8,3,2,0,9,4,4,7,3,5,5,2,4,6,6)
df$z <- c("A","A","A","B","B","B","C","C","C","D","D","D","F","F","F","G","G","G","H","H","I","I")

set.seed(30)

# Split data frame based on info in one column (i.e. df$x) and store in a list 
dt_list <- split(df, f = df$x)

# RANDOMIZE data list -- Create a new index and change the order of dt_list
# SAVE the result to "random list" (i.e. 'rd_list')

rd_list <- dt_list[sample(1:length(dt_list), length(dt_list))]

# Put back together data in the order established in 'rd_list' 
rd_data <- do.call(rbind, rd_list)

这会根据需要将数据帧随机化，但是我不知道如何“保存并重复”，因此我得到了多个文件，比如说大约20个文件，分别是原始文件和顺序编号（例如df_1，df_2） ...）。

此外，作为随机样本，有可能获得重复。有什么方法可以自动丢弃重复的文件？

谢谢！

Answer 1

这是一种利用while包中的sample_n()循环和方便的dplyr函数的方法，该方法从数据帧中采样指定数量的行（有或没有替换）。

library(dplyr)

# Create the data
weather_data <- data.frame(Weather = c("Sunny", "Cloudy", "Rainy", "Sunny"),
                           Temperature = c(75, 68, 71, 76))

# Twenty times, repeatedly sample rows from the data and write to a csv file
total_files <- 20
df_index <- 1

while (df_index <= total_files) {
  # Get a sample of the data
  sampled_subset <- sample_n(weather_data,
                             size = 10,
                             replace = TRUE)

  # Write the data to a csv file
  filename_to_use <- paste0("Sample_Data", "_", df_index, ".csv")

  write.csv(x = sampled_subset,
            file = filename_to_use, sep = ",")

  df_index <- df_index + 1
}

r-通过随机化数据帧生成多个文件

1 个答案: