我想拆分字符。虽然我有一个大型数据框可以工作,但下面的小例子展示了需要做什么。
mydf <- data.frame (name = c("L1", "L2", "L3"),
M1 = c("AC", "AT", NA), M2 = c("CC", "--", "TC"), M3 = c("AT", "TT", "AG"))
我想将变量M1到M3的字符分割(在真实数据集中我有> 6000个变量)
name M1a M1b M2a M2b M3a M3b
L1 A C C C A T
L2 A T - - T T
L3 NA NA T C A G
我尝试了以下代码:
func<- function(x) {sapply( strsplit(x, ""),
match, table= c("A","C","T","G", "--", NA))}
odataframe <- data.frame(apply(mydf, 1, func) )
colnames(odataframe) <- paste(rep(names(mydf), each = 2), c("a", "b"), sep = "")
odataframe
答案 0 :(得分:3)
你走了:
splitCol <- function(x){
x <- as.character(x)
x[is.na(x)] <- "$$"
z <- matrix(unlist(strsplit(x, split="")), ncol=2, byrow=TRUE)
z[z=="$"] <- NA
z
}
newdf <- as.data.frame(do.call(cbind, lapply(mydf[, -1], splitCol)))
names(newdf) <- paste(rep(names(mydf[, -1]), each=2), c("a", "b"), sep="")
newdf <- data.frame(mydf[, 1, drop=FALSE], newdf)
newdf
name M1a M1b M2a M2b M3a M3b
1 L1 A C C C A T
2 L2 A T - - T T
3 L3 <NA> <NA T C A G
答案 1 :(得分:1)
Andrie的代码作为可复制函数
splitCol <- function(dataframe, splitVars=names(dataframe)){
split.DF <- dataframe[,splitVars]
keep.DF <- dataframe[, !names(dataframe) %in% c(splitVars)]
X <- function(x)matrix(unlist(strsplit(as.character(x), split="")), ncol=2, byrow=TRUE)
newdf <- as.data.frame(do.call(cbind, suppressWarnings(lapply(split.DF, X))) )
names(newdf) <- paste(rep(names(split.DF), each=2), c(".a", ".b"), sep="")
data.frame(keep.DF,newdf)
}
测试出来
splitCol(mydf)
splitCol(mydf, c('M1','M2'))
请不要将此作为正确答案投票。安德烈的答案显然是第一个正确的答案。这只是他的代码扩展到更多情况。感谢问题并感谢Andrie代码。