我正在尝试将数据集从长格式转换为宽格式。需要这样做以馈入另一个程序以进行分析。我的输入数据如下:
sdata <- data.frame(c(1,1,1,1,1,1,1,1,1,1,1,1,1),c(1,1,1,1,1,1,1,1,1,2,2,2,2),c("X1","A","B","C","D","X2","A","B","C","X1","A","B","C"),c(81,31,40,5,5,100,8,90,2,50,20,24,6))
col_headings <- c("Orig","Dest","Desc","Estimate")
names(sdata) <- col_headings
输入数据
根据上述Orig-Dest-X1,Orig-Dest-X2类别的独特组合,子类别从仅A,B,C到A,B,C,D到A,B等不等。我是尝试获取所需的输出(下面的R中重新创建的代码)以及所需输出的图像。
sdata_spread <- data.frame(c(1,1),c(1,2),c(81,50),c(31,20),c(40,24),c(5,6),c(5,NA),c(100,NA),c(8,NA),c(90,NA),c(2,NA))
col_headings <- c("Orig","Dest","X1", "X1_A", "X1_B", "X1_C", "X1_D","X2", "X2_A", "X2_B", "X2_C")
names(sdata_spread) <- col_headings
所需的输出
我尝试了以下操作:
sdata_spread <- sdata %>% spread(Desc,Estimate)
我得到的错误是:
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 6 rows
我还尝试了此处给出的可接受的答案:Long to wide with no unique key和此处给出的答案:Long to wide format with several duplicates. Circumvent with unique combo of columns,但它没有为我提供所需的输出。
任何见解将不胜感激。
谢谢, 克里希南
答案 0 :(得分:1)
一个选项是基于作为“ Desc”中第一个字符“ X”的出现来创建分组变量,使用该变量通过paste
将first
修改为“ Desc” 'Desc'元素与每个元素基于case_when
中的条件并使用pivot_wider
整形为宽格式(从tidyr_1.0.0
,spread/gather
开始弃用,地方pivot_wider/pivot_longer
已使用)
library(dplyr)
library(tidyr)
library(stringr)
sdata %>%
group_by(grp = cumsum(str_detect(Desc, '^X'))) %>%
mutate(Desc = case_when(row_number() > 1 ~ str_c(first(Desc), Desc, sep="_"),
TRUE ~ as.character(Desc))) %>%
ungroup %>%
select(-grp) %>%
pivot_wider(names_from = Desc, values_from = Estimate)
# A tibble: 2 x 11
# Orig Dest X1 X1_A X1_B X1_C X1_D X2 X2_A X2_B X2_C
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 81 31 40 5 5 100 8 90 2
#2 1 2 50 20 24 6 NA NA NA NA NA