我想将聚合数据集转换为新的派生数据集,其中包含与初始聚合相对应的各个实例。 我从R加载数据集Titanic并查看其数据框。我看到每个元组出现的频率都是聚合的。 (例如,20名女性成年船员在坠机事件中幸存下来)。我想通过用相应的非聚合元组替换每个元组来重建数据集(例如,元组“Crew,Female,Adult,Yes”的20倍)。 我知道如何聚合数据集,但我无法转换已经聚合的数据集。任何提示都将非常感激。
library(dplyr)
library(purrr)
library(tidyr)
# keep data with frequency > 0
T = data.frame(Titanic, stringsAsFactors = F) %>% filter(Freq > 0)
tbl_df(T) %>% # tbl_df() only used to produce a more readable output (i.e. print only a few rows)
mutate(id = map(Freq, ~ 1:.)) %>% # create a vector of ids from 1 to Freq for each row
unnest(id) # expand the vector
# # A tibble: 2,201 x 6
# Class Sex Age Survived Freq id
# <fctr> <fctr> <fctr> <fctr> <dbl> <int>
# 1 3rd Male Child No 35 1
# 2 3rd Male Child No 35 2
# 3 3rd Male Child No 35 3
# 4 3rd Male Child No 35 4
# 5 3rd Male Child No 35 5
# 6 3rd Male Child No 35 6
# 7 3rd Male Child No 35 7
# 8 3rd Male Child No 35 8
# 9 3rd Male Child No 35 9
# 10 3rd Male Child No 35 10
# # ... with 2,191 more rows
如果需要,您可以删除id列。我把它留在那里只是为了更容易看到这个过程是如何工作的。 您还可以检查新数据集的行数是否为2,201,它等于sum(T $ Freq)。因此,正如预期的那样,原始数据集的频率总和是新数据集的行数。
答案 0 :(得分:0)
library(dplyr)
library(purrr)
library(tidyr)
# keep data with frequency > 0
T = data.frame(Titanic, stringsAsFactors = F) %>% filter(Freq > 0)
tbl_df(T) %>% # tbl_df() only used to produce a more readable output (i.e. print only a few rows)
mutate(id = map(Freq, ~ 1:.)) %>% # create a vector of ids from 1 to Freq for each row
unnest(id) # expand the vector
# # A tibble: 2,201 x 6
# Class Sex Age Survived Freq id
# <fctr> <fctr> <fctr> <fctr> <dbl> <int>
# 1 3rd Male Child No 35 1
# 2 3rd Male Child No 35 2
# 3 3rd Male Child No 35 3
# 4 3rd Male Child No 35 4
# 5 3rd Male Child No 35 5
# 6 3rd Male Child No 35 6
# 7 3rd Male Child No 35 7
# 8 3rd Male Child No 35 8
# 9 3rd Male Child No 35 9
# 10 3rd Male Child No 35 10
# # ... with 2,191 more rows
如果需要,您可以删除id
列。我把它留在那里只是为了让它更容易看出这个过程是如何运作的。
您还可以检查新数据集的行数是否为2,201,等于sum(T$Freq)
。因此,正如预期的那样,原始数据集的频率总和是新数据集的行数。