在R

时间:2016-05-07 13:56:05

标签: r dplyr

我有一个“混乱”的数据集,我正在努力转换为整洁的格式。这是数据的样子:

make <- c("Honda", "Jeep", "Nissan", "Ford")
model <- c("Civic", "Wrangler", "Altima", "Focus")
year <- c(1996, 2000, 2005, 1988)
color <- c("red;green;blue", "red;blue", "purple;red;green;black", "yellow;white;blue")
car.df <- data.frame(make, model, year, color)

我想要做的是通过分离出“颜色”字段然后将每个品牌/型号/年份/颜色分开来将数据转换为整齐/长格式。因此输出看起来像这样(仅适用于本田和吉普):

make.new <- c("Honda", "Honda", "Honda", "Jeep", "Jeep")
model.new <- c("Civic", "Civic", "Civic", "Wrangler", "Wrangler")
year.new <- c(1996, 1996, 1996, 2000, 2000)
color.new <- c("red", "green", "blue", "red", "blue")
car.df.new <- data.frame(make.new, model.new, year.new, color.new)

有关如何执行此操作的任何建议?在数据集中,可以有许多不同的颜色,因此一旦将颜色字段分成不同的列,您可能会有许多不同的列进行整理(因此,每个整数数据集中的每行都有不同的行数)品牌/型号/年)。

感谢任何有用的建议!

史蒂夫

1 个答案:

答案 0 :(得分:2)

来自cSplit

splitstackshape可以以紧凑的方式完成此操作。指定splitCols(&#34;颜色&#34;),sep(&#34 ;;&#34;)和direction(&#34; long&#34; ),它将给出预期的输出。

library(splitstackshape)
cSplit(car.df, "color", ";", "long")
#      make    model year  color
# 1:  Honda    Civic 1996    red
# 2:  Honda    Civic 1996  green
# 3:  Honda    Civic 1996   blue
# 4:   Jeep Wrangler 2000    red
# 5:   Jeep Wrangler 2000   blue
# 6: Nissan   Altima 2005 purple
# 7: Nissan   Altima 2005    red
# 8: Nissan   Altima 2005  green
# 9: Nissan   Altima 2005  black
#10:   Ford    Focus 1988 yellow
#11:   Ford    Focus 1988  white
#12:   Ford    Focus 1988   blue

如果我们需要dplyr/tidyr解决方案

library(dplyr)
library(tidyr)
library(stringr)
separate(car.df, color, into = paste0("color", seq(max(str_count(color, 
           ";"))+1)), fill="right") %>% 
     gather(Var, color, - make, -model, -year) %>%
     select(-Var)