我正在尝试对数据使用pivot_wider
。数据如下:
dates yes_no
1 2017-01-01 0
2 2017-01-02 1
3 2017-01-03 0
4 2017-01-04 1
5 2017-01-05 1
我要在哪里获得预期的输出?
dates yes_no 2017-01-02_1 2017-01-04_1 2017-01-05_1
1 2017-01-01 0 0 0 0
2 2017-01-02 1 1 0 0
3 2017-01-03 0 0 0 0
4 2017-01-04 1 0 1 0
5 2017-01-05 1 0 0 1
spread
列为1英寸时数据为yes_no
的地方。
这对我不起作用:
d %>%
mutate(value_for_one_hot = 1) %>%
pivot_wider(names_from = dates, values_from = value_for_one_hot,
names_prefix = "date_", values_fill = list(value_for_one_hot = 0))
数据:
data.frame(
dates = c("2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
yes_no = c(0, 1, 0, 1, 1)
)
答案 0 :(得分:1)
为yes_no
创建一个重复列,为列名称创建一个新列,然后执行常规的spread
或pivot_wider
library(dplyr)
library(tidyr)
df %>% mutate(yes_no_dup=yes_no, cols=if_else(yes_no==1, paste0(dates,'_1'), NA_character_)) %>%
spread(cols, yes_no_dup, fill = list(yes_no_dup = 0)) %>%
select(-`<NA>`)
答案 1 :(得分:1)
这是一种data.table方法,实际上并没有改变数据的形状。
library(data.table)
setDT(d)
ind <- d[['yes_no']] != 0
cols <- as.character(d[['dates']])[ind]
d[, (cols) := 0L]
d[ind, (cols) := as.data.frame(diag(.N))]
## also valid
# set(d, which(ind), cols, as.data.frame(diag(length(cols))))
d