我是R的新手,并且有点挣扎。我有这样的数据框
reg 12345
val1 1
val2 0
reg 45678
val1 0
val2 0
val3 1
reg 97654
val1 1
reg 567834
val3 1
reg 567845
val2 0
val4 1
我的目标是将数据转换为此格式
reg val1 val2 val3 val4
12345 1 0 0 0
45678 0 0 1 0
97654 1 0 0 0
567834 0 0 1 0
567845 0 0 0 1
希望有人能指导我。我的数据源少于200行,并且没有关于该方法的限制。请假设 机器运行有足够的内存和处理能力。
答案 0 :(得分:0)
即使这是重复的,我也没有看到以下答案,所以...从原始数据开始:
df <- data.frame( A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))
我使用tidyverse
动词,以及使用dummy
向每个"reg"
群组添加标签(cumsum
)的技巧:
install.packages("tidyverse")
library(tidyverse)
df1 <- df %>%
mutate(dummy = cumsum(A=="reg")) %>%
group_by(dummy) %>%
nest() %>%
mutate(data = map(data, ~spread(.x, A, B))) %>%
unnest() %>%
select(-dummy)
这导致:
reg val1 val2 val3 val4
1 12345 1 0 NA NA
2 45678 0 0 1 NA
3 97654 1 NA NA NA
4 567834 NA NA 1 NA
5 567845 NA 0 NA 1
我更喜欢保留NAs
,但如果你不这样做:
df1[is.na(df1)] <- 0
reg val1 val2 val3 val4
1 12345 1 0 0 0
2 45678 0 0 1 0
3 97654 1 0 0 0
4 567834 0 0 1 0
5 567845 0 0 0 1
答案 1 :(得分:0)
以下是使用dcast
library(data.table)
dcast(setDT(df), cumsum(A=="reg") ~ A, value.var = "B", fill = 0)[, A := NULL][]
# reg val1 val2 val3 val4
#1: 12345 1 0 0 0
#2: 45678 0 0 1 0
#3: 97654 1 0 0 0
#4: 567834 0 0 1 0
#5: 567845 0 0 0 1