如果另一列具有完全字符串,则将值分配给新列

时间:2017-04-11 12:55:58

标签: r

我的df看起来像这样。

Date    Winner
4/12    Tom
4/13    Abe
4/14    George
4/15    Tom

我想添加新列,如果名称出现在获胜者列中,则分配1;如果名称未出现,则添加0,反之亦然。理想情况下,df看起来像这样

Date    Winner    Tom_Win    Tom_Lose    Abe_Win    Abe_Lose    George_Win    George Lose    
4/12    Tom       1          0           0          1           0           1
4/13    Abe       0          1           1          0           0           1
4/14    George    0          1           0          1           1           0  
4/15    Tom       1          0           0          1           0           1

有没有一种简单的方法可以实现这一目标?

4 个答案:

答案 0 :(得分:2)

如果使用model.matrix函数,这非常简单,当名称没有出现时,它将创建N个虚拟列,当它出现时将创建一个虚拟列(完全按照您的要求),代码如下: (假设您的数据称为db)

> winners <- model.matrix(~Winner - 1, data=db)
> winners

  WinnerAbe WinnerGeorge WinnerTom
1         0            0         1
2         1            0         0
3         0            1         0
4         0            0         1

此位用于计算具有丢失值的列

winners <- as.data.frame(winners)
winners$loserAbe <- as.numeric(!winners$WinnerAbe) #naturally you have to 
                                                   #do this for every column you need
  WinnerAbe WinnerGeorge WinnerTom loserAbe
1         0            0         1        1
2         1            0         0        0
3         0            1         0        1
4         0            0         1        1

winners$Date <- db$Date #this last bit so you don't lose the date.

答案 1 :(得分:2)

使用mtabulate包中的qdapTools,我们可以执行以下三个步骤,

library(qdapTools)

d1 <- mtabulate(d3$Winner)

d2 <- setNames(data.frame(sapply(d1, function(i) ifelse(i == 1, 0, 1))), 
                                                       paste0(names(d1), '_Lose'))

cbind(d3$Date, d1, d2)

#  d3$Date Abe George Tom Abe_Lose George_Lose Tom_Lose
#1    4/12   0      0   1        1           1        0
#2    4/13   1      0   0        0           1        1
#3    4/14   0      1   0        1           0        1
#4    4/15   0      0   1        1           1        0

数据

str(d3)
'data.frame':   4 obs. of  2 variables:
 $ Date  : Factor w/ 4 levels "4/12","4/13",..: 1 2 3 4
 $ Winner: Factor w/ 3 levels "Abe","George",..: 3 1 2 3

答案 2 :(得分:1)

我确信有一种比这更好的方法,但这适用于基础R并且它非常简单:

如果您的数据如下所示:

df <- data.frame(Date = c("4/12","4/13","4/14","4/15"),Winner = c("Tom","Abe","George","Tom"))

如下所示追加额外的列:

xcols <- c(paste0(unique(df$Winner), '_Win'), paste0(unique(df$Winner), '_Lose'))
df[ , xcols] <- 0

现在制作一个带有指示的角色向量,为每个玩家提供积分。

evl <- unlist(lapply(unique(df$Winner), function(x){paste0('df[', which(df$Winner == x), ',', which(names(df) == paste0(x, '_Win')), '] <- 1')}))

执行代码:

eval(parse(text = evl))

答案 3 :(得分:1)

df <- data.frame(
  Date = c("4/12", "4/13","4/14", "4/15"),
  Winner = c("Tom", "Abe", "George", "Tom")
)


df2 <- do.call(cbind,
      lapply(seq_along(levels(df$Winner)), function(x) {

         win  <- ifelse(df$Winner == levels(df$Winner)[x], 1, 0)
         lose <- ifelse(df$Winner == levels(df$Winner)[x], 0, 1)

         dat <- cbind(win, lose)
         colnames(dat) <-  c(paste(levels(df$Winner)[x], "win", sep = "_"),  paste(levels(df$Winner)[x], "lose", sep = "_"))

         dat
     })
)


cbind(df, df2)


> cbind(df, df2)
  Date Winner Abe_win Abe_lose George_win George_lose Tom_win Tom_lose
1 4/12    Tom       0        1          0           1       1        0
2 4/13    Abe       1        0          0           1       0        1
3 4/14 George       0        1          1           0       0        1
4 4/15    Tom       0        1          0           1       1        0