使用其他列应用某些列的公式

时间:2017-07-25 16:54:42

标签: r dplyr

我有这个数据框

  dat = data.frame(Type = c("A","A","B","B","C","C","D"), NextType = c("A", "B","B", "C","C","D",NA), 
                 A = c(rep(0,7)), 
                 B = rep(0,7), 
                 C = rep(0,7) , 
                 D = rep(0,7),
                 stringsAsFactors = F)
dat

 Type NextType A B C D
1    A        A 0 0 0 0
2    A        B 0 0 0 0
3    B        B 0 0 0 0
4    B        C 0 0 0 0
5    C        C 0 0 0 0
6    C        D 0 0 0 0
7    D     <NA> 0 0 0 0

如果列名称(A,B,C,D等...)= Type = NextType

,用A填充A,B,C和D列的最佳方法是什么?

所以

column A would be 1,0,0,0,0,0,0
column B would be 0,0,1,0,0,0,0
column C would be 0,0,0,0,1,0,0
column D would be 0,0,0,0,0,0,0

注意 - 这应该是动态的。我在A,B和C和D之上有4列,但可以有10,20或任意数量的列。

4 个答案:

答案 0 :(得分:1)

我会这样做:

library(tidyr)
library(dplyr)
dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA))
dat <- dat %>% mutate(A=ifelse(Type == NextType & Type == 'A', 1, 0),B=ifelse(Type == NextType & Type == 'B', 1, 0),C=ifelse(Type == NextType & Type == 'C', 1, 0))

答案 1 :(得分:1)

以下是使用model.matrixdiffapply的方法。

cbind(dat[1], apply(model.matrix(~Type-1, dat), 2, function(x) c(x[1], diff(x) > 0)))

model.matrix(~Type-1, dat)返回一个虚拟变量矩阵,当列中存在相应的值时,每列为1。这将被送到apply,它会获取每列并返回列的第一个值以及差值是否大于0的评估。使用cbind将结果矩阵与第一列组合

返回

  Type TypeA TypeB TypeC
1    A     1     0     0
2    A     0     0     0
3    B     0     1     0
4    B     0     0     0
5    C     0     0     1
6    C     0     0     0

如果您还希望包含第二列,请将df[1]更改为df[1:2]

使用lapply的替代基础R方法是

dat[, LETTERS[1:3]] <- lapply(unique(dat$Type),
                              function(x) (dat$Type == x) * !duplicated(dat$Type))

在这里,我们循环遍历dat $ Type的唯一值,并检查dat $ Type的每个元素是否等于此值以及元素是否重复。这将返回一个列表,该列表分配给dat中的相应变量。

答案 2 :(得分:1)

使用dplyrtidyr

library(dplyr); library(tidyr);

dat %>% 
    select(Type, NextType) %>% 
    mutate(key = if_else(Type == NextType & !is.na(Type) & !is.na(NextType), Type, "other"), 
           val = 1) %>% 
    spread(key, val, fill = 0) %>% 
    select(-other)

#  Type NextType A B C
#1    A        A 1 0 0
#2    A        B 0 0 0
#3    B        B 0 1 0
#4    B        C 0 0 0
#5    C        C 0 0 1
#6    C     <NA> 0 0 0

数据

dat = data.frame(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA), A = c(rep(0,6)), B = rep(0,6), C = rep(0,6) , stringsAsFactors = F)

答案 3 :(得分:0)

data.table

library(data.table)
dat = data.table(Type = c("A","A","B","B","C","C"), NextType = c("A", "B","B", "C","C",NA), 
             A = c(rep(0,6)), B = rep(0,6), C = rep(0,6) )
dat

dat[Type=="A", A:=(Type == NextType)]
dat[Type=="B", B:=(Type == NextType)]
dat[Type=="C", C:=(Type == NextType)]

修改

动态(可能效率不高,也许某人有其他建议?)

mycols <- names(dat)[!(names(dat) %in% c("Type", "NextType"))]
for(i in mycols){
  dat[Type==i, (i) := (Type==NextType)]
}