我有一个数据框架,其中有5个具有1000行的变量。前几行和前五列如下:
id trade category gender experience
1 carpenter c m no
1 mason b m yes
2 electrician a f no
2 plumber a f no
2 carpenter c f yes
2 mason d f no
3 plumber a m no
4 mason b m yes
4 plumber m no
4 electrician b m no
我尝试了展开和变形以从长到宽转换
我希望看到以下内容。
id trade1 catgory1 trade2 category 2 trade3 category3 trade4 category4 gender
1 carpenter c mason b na na na na m
2 electrician a plumber a carpenter c mason d f
3 plumber a na na na na na na m
4 mason b plumber na electrician b na na m
答案 0 :(得分:0)
使用tidyverse
我们可以删除experience
列,因为在最终输出gather
长格式的数据帧group_by
id
中不需要和key
,并以宽格式创建新的组标识符key1
和spread
。
library(tidyverse)
df1 <- df %>%
select(-experience) %>%
gather(key, value ,-id, -gender) %>%
group_by(id, key) %>%
mutate(key1 = paste(key, row_number(), sep = "_")) %>%
ungroup() %>%
select(-key) %>%
spread(key1, value)
df1
# A tibble: 4 x 10
# id gender category_1 category_2 category_3 category_4 trade_1 trade_2 trade_3 trade_4
# <int> <fct> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 m c b NA NA carpenter mason NA NA
#2 2 f a a c d electrician plumber carpenter mason
#3 3 m a NA NA NA plumber NA NA NA
#4 4 m b b b NA mason plumber electrician NA
我们可以按要求的格式排列列
df1 %>%
select(id, gender, ends_with("1"), ends_with("2"), ends_with("3"), ends_with("4"))
或更普遍地
cbind(df1[1:2], df1[-c(1:2)][order(readr::parse_number(names(df1[-(1:2)])))])
数据
df <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 4L, 4L, 4L),
trade = structure(c(1L, 3L, 2L, 4L, 1L, 3L, 4L, 3L, 4L, 2L
), .Label = c("carpenter", "electrician", "mason", "plumber"
), class = "factor"), category = structure(c(3L, 2L, 1L,
1L, 3L, 4L, 1L, 2L, 2L, 2L), .Label = c("a", "b", "c", "d"
), class = "factor"), gender = structure(c(2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("f", "m"), class = "factor"),
experience = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L), .Label = c("no", "yes"), class = "factor")), class =
"data.frame", row.names = c(NA, -10L))