我有一个数据框(theData),其中的值由管道分隔:
Col1 Col2 Col3
1 colors red|green|purple
1 colors red|pink|yellow
1 colors yellow|mauve|purple
1 colors red|green|orange
1 colors red|yellow|purple
1 colors red|green|purple
我想将Col3分成这样的附加列:
Col1 Col2 Col3 Col4 Col5
1 colors red green purple
1 colors red pink yellow
1 colors yellow mauve purple
1 colors red green orange
1 colors red yellow purple
1 colors red green purple
我尝试了以下内容:
str_split_fixed(as.character(theData$Col3), "|", 3)
但这不起作用。
答案 0 :(得分:2)
My cSplit
function很容易处理这类问题。
cSplit(theData, "Col3", "|")
# Col1 Col2 Col3_1 Col3_2 Col3_3
# 1: 1 colors red green purple
# 2: 1 colors red pink yellow
# 3: 1 colors yellow mauve purple
# 4: 1 colors red green orange
# 5: 1 colors red yellow purple
# 6: 1 colors red green purple
结果是data.table
,因为该函数使用" data.table"它提供的效率包,特别是对于更大的数据集。
答案 1 :(得分:1)
您还可以尝试colsplit
reshape
library(reshape)
cbind(theData[,1:2],
colsplit(theData$Col3, "[|]", names=c("Col3", "Col4", "Col5")))
# Col1 Col2 Col3 Col4 Col5
#1 1 colors red green purple
#2 1 colors red pink yellow
#3 1 colors yellow mauve purple
#4 1 colors red green orange
#5 1 colors red yellow purple
#6 1 colors red green purple
或者只使用read.table
cbind(theData[,1:2],
setNames(read.table(text=theData$Col3,sep="|",header=F,stringsAsFactors=F),paste0("Col",3:5)))
答案 2 :(得分:1)
再添加一个选项。看你是否喜欢。这是Hadley的tidyr套餐。代码很干净。
> library(tidyr)
> test <- data.frame(Col3 = c("red|green|purple", "red|pink|yellow"))
> test
Source: local data frame [2 x 1]
Col3
1 red|green|purple
2 red|pink|yellow
> test %>% separate(Col3, c("A", "B", "C"), sep = "\\|")
Source: local data frame [2 x 3]
A B C
1 red green purple
2 red pink yellow
答案 3 :(得分:0)
您只需将|
与[]
一起打包,或使用\\|
将其转义。这似乎是mapply
的工作。
> m <- mapply(strsplit, dat$Col3, split = "[|]", USE.NAMES = FALSE)
> setNames(cbind(dat[-3], do.call(rbind, m)), paste0("Col", 1:5))
# Col1 Col2 Col3 Col4 Col5
# 1 1 colors red green purple
# 2 1 colors red pink yellow
# 3 1 colors yellow mauve purple
# 4 1 colors red green orange
# 5 1 colors red yellow purple
# 6 1 colors red green purple
使用您str_split_fixed
的尝试,只需要稍加改动,
> library(stringr)
> cbind(dat[-3], str_split_fixed(dat$Col3, "[|]", 3))