将包含逗号分隔的字符串值的列拆分为R中的新标题列

时间:2018-02-24 06:53:31

标签: r string reshape

我有一个数据框,其中一列包含字符串,以逗号分隔。我想知道是否有一种有效的方法可以将这些以逗号分隔的值转换为新的列标题,并使这些新列值成为二进制(如果它们是原始行的一部分)。我的数据样本可以在下面复制:

data <- structure(list(id = c(6901257L, 6304928L, 7919400L), amenities = 
c("Wireless Internet,Air conditioning,Kitchen,Heating,Family/kid 
friendly,Essentials,Hair dryer,Iron,translation missing: 
en.hosting_amenity_50",  "Wireless Internet,Air 
conditioning,Kitchen,Heating,Family/kid friendly,Washer,Dryer,Smoke 
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair 
dryer,Iron,translation missing: en.hosting_amenity_50",  "TV,Cable 
TV,Wireless Internet,Air 
conditioning,Kitchen,Breakfast,Buzzer/wireless 
intercom,Heating,Family/kid friendly,Smoke detector,Carbon monoxide 
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair 
dryer,Iron,Laptop friendly workspace,translation missing: 
en.hosting_amenity_50" )), .Names = c("id", "amenities"), class = 
"data.frame", row.names = c(NA,  3L))

我有一种产生结果的低效方法,即将数据制作成长格式,然后在reshape2中使用dcast。这种效率低下的方法可以通过以下方式复制:

data.long <- data %>%
mutate(amenities = strsplit(as.character(amenities), ",")) %>%
unnest(amenities)

data.long$amenities.value <- 1

data.wide <- reshape2::dcast(data.long, id ~ amenities, value.var = 
"amenities.value") #desired result

是否有更有效的方法从原始数据结构中获得所需的结果?

2 个答案:

答案 0 :(得分:2)

这是一种使用库splitstackshape的方法:

library(splitstackshape) 
library(tidyverse)

cSplit(df,  "amenities", sep = ",", direction = "long") %>%
  mutate(value = 1) %>%
  spread(amenities, value) -> df.wide

all.equal(df.wide, data.wide)
#TRUE

根据@ A5C1D2H2I1M1N2O1R2T1,更密集,更快速的解决方案

cSplit_e(data, "amenities", ",", mode = "binary", type = "character", drop = TRUE)

答案 1 :(得分:0)

仅使用tidyverse

library(tidyverse)
data %>% 
  separate_rows(amenities, sep = ",") %>% 
  table() %>% 
  data.frame() %>% 
  spread(amenities,Freq)