我有这个数据框
dat = data.frame(Type = c("A","A","B","B","C","C","D"), NextType = c("A", "B","B", "C","C","D",NA),
A = c(rep(0,7)),
B = rep(0,7),
C = rep(0,7) ,
D = rep(0,7),
stringsAsFactors = F)
dat
Type NextType A B C D
1 A A 0 0 0 0
2 A B 0 0 0 0
3 B B 0 0 0 0
4 B C 0 0 0 0
5 C C 0 0 0 0
6 C D 0 0 0 0
7 D <NA> 0 0 0 0
如果列名称(A,B,C,D等...)= Type = NextType
,用A填充A,B,C和D列的最佳方法是什么?所以
column A would be 1,0,0,0,0,0,0
column B would be 0,0,1,0,0,0,0
column C would be 0,0,0,0,1,0,0
column D would be 0,0,0,0,0,0,0
注意 - 这应该是动态的。我在A,B和C和D之上有4列,但可以有10,20或任意数量的列。
答案 0 :(得分:1)
我会这样做:
library(tidyr)
library(dplyr)
dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA))
dat <- dat %>% mutate(A=ifelse(Type == NextType & Type == 'A', 1, 0),B=ifelse(Type == NextType & Type == 'B', 1, 0),C=ifelse(Type == NextType & Type == 'C', 1, 0))
答案 1 :(得分:1)
以下是使用model.matrix
,diff
和apply
的方法。
cbind(dat[1], apply(model.matrix(~Type-1, dat), 2, function(x) c(x[1], diff(x) > 0)))
model.matrix(~Type-1, dat)
返回一个虚拟变量矩阵,当列中存在相应的值时,每列为1。这将被送到apply
,它会获取每列并返回列的第一个值以及差值是否大于0的评估。使用cbind
将结果矩阵与第一列组合
返回
Type TypeA TypeB TypeC
1 A 1 0 0
2 A 0 0 0
3 B 0 1 0
4 B 0 0 0
5 C 0 0 1
6 C 0 0 0
如果您还希望包含第二列,请将df[1]
更改为df[1:2]
。
使用lapply
的替代基础R方法是
dat[, LETTERS[1:3]] <- lapply(unique(dat$Type),
function(x) (dat$Type == x) * !duplicated(dat$Type))
在这里,我们循环遍历dat $ Type的唯一值,并检查dat $ Type的每个元素是否等于此值以及元素是否重复。这将返回一个列表,该列表分配给dat中的相应变量。
答案 2 :(得分:1)
使用dplyr
和tidyr
:
library(dplyr); library(tidyr);
dat %>%
select(Type, NextType) %>%
mutate(key = if_else(Type == NextType & !is.na(Type) & !is.na(NextType), Type, "other"),
val = 1) %>%
spread(key, val, fill = 0) %>%
select(-other)
# Type NextType A B C
#1 A A 1 0 0
#2 A B 0 0 0
#3 B B 0 1 0
#4 B C 0 0 0
#5 C C 0 0 1
#6 C <NA> 0 0 0
数据:
dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA), A = c(rep(0,6)), B = rep(0,6), C = rep(0,6) , stringsAsFactors = F)
答案 3 :(得分:0)
data.table
library(data.table)
dat = data.table(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA),
A = c(rep(0,6)), B = rep(0,6), C = rep(0,6) )
dat
dat[Type=="A", A:=(Type == NextType)]
dat[Type=="B", B:=(Type == NextType)]
dat[Type=="C", C:=(Type == NextType)]
修改强>
动态(可能效率不高,也许某人有其他建议?)
mycols <- names(dat)[!(names(dat) %in% c("Type", "NextType"))]
for(i in mycols){
dat[Type==i, (i) := (Type==NextType)]
}