通过在R中有条件地合并现有列来创建新列

时间:2019-04-07 15:12:11

标签: r

尊敬的成员,请帮助我解决以下问题。

df.data

 Team_1     Team_2       Cond           
  RKS         MGR          1           
  MGR         RKS          2           
  VSK         LSR          1           
  LSR     VSK          2

要创建新数据框,如果Cond = 1 New_Column = Team_1与Team_2,否则Team_2与Team_1

预期结果

df.Newdata

 Team_1     Team_2       Cond       New_Column    
  RKS         MGR          1           RKS Vs MGR
  MGR         RKS          2           RKS Vs MGR
  VSK         LSR          1           VSK Vs LSR
  LSR         VSK          2           VSK Vs LSR

3 个答案:

答案 0 :(得分:0)

您可以在此处使用ifelse,并在Cond列中确定哪个团队在比较中排名第一。

df$New_Column <- ifelse(df$Cond == 1,
                        paste0(df$Team_1, " Vs ", df$Team_2),
                        paste0(df$Team_2, " Vs ", df$Team_1))
df

  Team_1 Team_2 Cond New_Column
1    RKS    MGR    1 RKS Vs MGR
2    MGR    RKS    2 RKS Vs MGR
3    VSK    LSR    1 VSK Vs LSR
4    LSR    VSK    2 VSK Vs LSR

答案 1 :(得分:0)

此外,您可以将sqldf用于数据帧df

df <- sqldf("SELECT Team_1, Team_2, Cond, CASE 
                                              WHEN  Cond = 1 THEN Team_1 || ' Vs ' || Team_2
                                              WHEN  Cond = 2 THEN Team_2 || ' Vs ' || Team_1 
                                              END New_Column
             FROM df ")

答案 2 :(得分:0)

由于“条件”列是列索引,因此也可以以其他方式利用row/column索引来提取元素值

# matrix of row/column index
m1 <- cbind(seq_len(nrow(df1)), df1$Cond)
# change the column index to get the value in alternate column
m2 <- cbind(m1[,1], 3 - m1[,2])
# paste the extracted values to create new column
df1$New_Column <- paste(df1[m1], "Vs", df1[m2])

基准

1e7数据集上经过时间的细微变化

df2 <- df1[rep(seq_len(nrow(df1)), 1e7), ]

system.time({
    m1 <- cbind(seq_len(nrow(df2)), df2$Cond)
    m2 <- cbind(m1[,1], 3 - m1[,2])
    paste(df2[m1], "Vs", df2[m2])

})
#user  system elapsed 
# 25.926   2.548  28.983 

system.time({
 ifelse(df2$Cond == 1,
                        paste0(df2$Team_1, " Vs ", df2$Team_2),
                        paste0(df2$Team_2, " Vs ", df2$Team_1))

})
# user  system elapsed 
# 28.446   1.934  30.542 

数据

df1 <- structure(list(Team_1 = c("RKS", "MGR", "VSK", "LSR"), Team_2 = c("MGR", 
  "RKS", "LSR", "VSK"), Cond = c(1L, 2L, 1L, 2L)), class = "data.frame",
 row.names = c(NA, -4L))