将逗号分隔列表转换为虚拟变量

时间:2017-02-20 23:37:31

标签: r grep string-matching dummy-variable

我有一张表格如下:

yel <- data.table(id=c(1,2,3))
yel$names[1] <- "\"parking space\", \"dining\", \"3bh\""
yel$names[2] <- "\"parking\" , \"outdoor\""
yel$names[3] <- "\"Hello!\",\"dining room\",\"3bh\""
yel

   id                            names
1:  1 "parking space", "dining", "3bh"
2:  2            "parking" , "outdoor"
3:  3     "Hello!","dining room","3bh"

我要将名称变量设置为dummify并加入相同的词语,例如停车位和停车场以及用餐室。

我想要虚拟变量如下:停车,用餐,3bh,户外,你好。有没有办法做到这一点?

1 个答案:

答案 0 :(得分:0)

这个怎么样(正则表达式可能仍然需要稍微调整一下,看起来不够普遍)。使用tidyr

separate_rows(yel,names,sep=",")->df1
df1 %>% mutate(newnames=gsub('\\"| space|\\!| |room', "", names))

  id           names newnames
1  1 "parking space"  parking
2  1        "dining"   dining
3  1           "3bh"      3bh
4  2      "parking"   parking
5  2       "outdoor"  outdoor
6  3        "Hello!"    Hello
7  3   "dining room"   dining
8  3           "3bh"      3bh