我有一张表格如下:
yel <- data.table(id=c(1,2,3))
yel$names[1] <- "\"parking space\", \"dining\", \"3bh\""
yel$names[2] <- "\"parking\" , \"outdoor\""
yel$names[3] <- "\"Hello!\",\"dining room\",\"3bh\""
yel
id names
1: 1 "parking space", "dining", "3bh"
2: 2 "parking" , "outdoor"
3: 3 "Hello!","dining room","3bh"
我要将名称变量设置为dummify并加入相同的词语,例如停车位和停车场以及用餐室。
我想要虚拟变量如下:停车,用餐,3bh,户外,你好。有没有办法做到这一点?
答案 0 :(得分:0)
这个怎么样(正则表达式可能仍然需要稍微调整一下,看起来不够普遍)。使用tidyr
:
separate_rows(yel,names,sep=",")->df1
df1 %>% mutate(newnames=gsub('\\"| space|\\!| |room', "", names))
id names newnames
1 1 "parking space" parking
2 1 "dining" dining
3 1 "3bh" 3bh
4 2 "parking" parking
5 2 "outdoor" outdoor
6 3 "Hello!" Hello
7 3 "dining room" dining
8 3 "3bh" 3bh