使用R将列拆分为多个字段

时间:2017-08-15 02:30:12

标签: r dplyr text-mining stringr text-analysis

我的csv中有一个列,其中包含字段“features”。这些字段包含这种格式的数据

{""Air conditioning"",""Elevator"",""Smoke detector""}
{""Air conditioning"",""Railing Lights"",""Smoke detector""}
{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}

它们是20000条记录,这些字符串位于字段“功能”内,没有任何特定顺序。

如何将它们分成不同的列,使所有“空调”全部属于第1列,“电梯”属于第2列,依此类推。

          a          b       c              d            
air conditioning elevators smokedetectors 
air conditioning elevators smokedetectors washer
air conditioning elevators smokedetectors washer

2 个答案:

答案 0 :(得分:0)

来自separate的{​​{1}}和来自tidyr的{​​{1}}(投放mutate_at)的组合:

dplyr

给出

gsub

请注意,合并额外字段(如第三条记录中所示),请查看dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}', '{""Air conditioning"",""Railing Lights"",""Smoke detector""}', '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}')) library(tidyr) library(dplyr) # Remove {,}, and quotes (") fix_txt <- function(x)gsub("[{]\"|\"|[}]", "", x) separate(dfr, features, c("a","b","c"), sep=",", extra="merge") %>% mutate_at(vars(a:c), fix_txt) 以获取更多选项。

答案 1 :(得分:0)

如前所述,您可以查看&#34; splitstackshape&#34;包,特别是cSplit_e函数。有了它,你可以尝试:

library(splitstackshape)
cSplit_e(as.data.table(dfr)[, features := (gsub("[{}\"]", "", features))], 
         "features", ",", mode = "value", type = "character", drop = TRUE)
##    features_Air conditioning features_Dryer features_Elevator features_Railing Lights features_Smoke detector features_Washer
## 1:          Air conditioning             NA          Elevator                      NA          Smoke detector              NA
## 2:          Air conditioning             NA                NA          Railing Lights          Smoke detector              NA
## 3:          Air conditioning          Dryer                NA                      NA          Smoke detector          Washer

&#34; dfr&#34;定义为@ Remko的答案:

dfr <- data.frame(features = c('{""Air conditioning"",""Elevator"",""Smoke detector""}',
                               '{""Air conditioning"",""Railing Lights"",""Smoke detector""}',
                               '{""Air conditioning"",""Washer"",""Dryer"",""Smoke detector""}'))