如何从R中的列表快速创建虚拟变量

时间:2016-08-06 19:48:57

标签: r dummy-variable

所以我是使用R的新手,我遇到了一个相当简单的任务。我有一个名为" Data"如下......

           Group       Score.Diff
Row 1   Kyle, Steve      15
Row 2   Matthew, Tony    12 
...     ...              ...            
Row n   Anthony, Zack    -10

我还有一个名为" Player.Names"在Data $ Group中的某个点发生的所有唯一名称,如此...

        Names
Row 1   Anthony
Row 2   Kyle
...     ...
Row n   Zack

我正在努力实现的目标是在" Data"中创建新列。表示每个唯一名称,如果名称在Data $ Group中,则包含值1,如果不在,则值为0。所需的输出如下所示......

           Group       Score.Diff  Anthony  Kyle  Steve ...  Zack
Row 1   Kyle, Steve      15           0      1     1    ...   0
Row 2   Matthew, Tony    12           0      0     0    ...   0
...     ...              ...         ...    ...   ...   ...  ...
Row n   Anthony, Zack    -10          1      0     0    ...   1

1 个答案:

答案 0 :(得分:0)

我们可以使用带有第一个数据集的grepl sapply Names' column in 'df2' (looped with as.integer ) to return a logical vector for the 'Group' column, coerce to binary with cbind`的and模式(' df1')

cbind(df1, sapply(df2$Names, function(x) as.integer(grepl(x, df1$Group))))
#               Group Score.Diff Anthony Kyle Zack
#Row 1   Kyle, Steve         15       0    1    0
#Row 2 Matthew, Tony         12       0    0    0
#Row n Anthony, Zack        -10       1    0    1

数据

df1 <- structure(list(Group = c("Kyle, Steve", "Matthew, Tony",
 "Anthony, Zack"
), Score.Diff = c(15L, 12L, -10L)), .Names = c("Group", "Score.Diff"
), class = "data.frame", row.names = c("Row 1", "Row 2", "Row n"))

df2 <- structure(list(Names = c("Anthony", "Kyle", "Zack")), 
   .Names = "Names", class = "data.frame", row.names = c("Row 1", "Row 2",  "Row n"))