我在数据框中有一个列,我使用colsplit将其拆分为三个单独的列。
df <- transform(df, concatenation = colsplit(concatenation, pattern="->-",
names = c('att1', 'att2','att3', 'att4')))
OR
df$concatenation <- colsplit(concatenation, pattern="->-",
names = c('att1', 'att2','att3', 'att4')))
concatenation
a->-a->-b->-c
b->-a->-b->-d
3->-a->-x->-c
2->-a->-y->-8
现在我有以下列,concatenation.att1,concatenation.att2等。
concatenation.att1 concatenation.att2 concatenation.att3 concatenation.att4
a a b c
b a b d
3 a x c
2 a y 8
尝试将此数据框导出为CSV时,出现以下错误:
Error in ncol(xj) : object 'xj' not found
OR
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
从研究中我推断出这是来自我的嵌套列,但是我找不到一种简单的方法来展平数据帧(如下所示)以导出到csv。
att1 att2 att3 att4
a a b c
b a b d
3 a x c
2 a y 8
目前我正在将数据重新分配到适当的级别并删除堆积列,但我相信有更好的方法可以做到这一点。
df$att1 <- df$concatenation$att1
df$att2 <- df$concatenation$att2
df$att3 <- df$concatenation$att3
df$att4 <- df$concatenation$att4
df$concatenation <- NULL
以下是一个可重现的例子:
#read in table
df <- read.table(textConnection(
"concatenation Value
AFG->-Afghanistan->-1950->-True 20,249
AFG->-Afghanistan->-1951->-True 21,352
AFG->-Afghanistan->-1952->-True 22,532
AFG->-Afghanistan->-1953->-True 23,557
AFG->-Afghanistan->-1954->-True 24,555
ALB->-Albania->-1950->-True 8,097
ALB->-Albania->-1951->-True 8,986"), header=TRUE)
#Split concatenation var
df <- transform(df, concatenation = colsplit(concatenation, pattern="->-",
names = c('att1', 'att2','att3', 'att4')))
#write to csv
write.csv(df, "myfile.csv")
答案 0 :(得分:1)
为什么你需要转变?试试这个:
df$concatenation <- colsplit(df$concatenation, "->-",
names = c("att1", "att2","att3", "att4"))
答案 1 :(得分:1)
看起来tidyr::separate
会这样做。
nm <- c('att1', 'att2','att3', 'att4')
df2 <- tidyr::separate(df, concatenation, nm, sep = "->-")
sapply(df2, typeof)
# att1 att2 att3 att4 Value
# "character" "character" "character" "character" "integer"
write.csv(df2)
# "","att1","att2","att3","att4","Value"
# "1","AFG","Afghanistan","1950","True","20,249"
# "2","AFG","Afghanistan","1951","True","21,352"
# "3","AFG","Afghanistan","1952","True","22,532"
# "4","AFG","Afghanistan","1953","True","23,557"
# "5","AFG","Afghanistan","1954","True","24,555"
# "6","ALB","Albania","1950","True","8,097"
# "7","ALB","Albania","1951","True","8,986"
在基数R中,strsplit()
将起作用。
df3 <- do.call(rbind.data.frame, strsplit(as.character(df$concatenation), "->-"))
cbind(setNames(df3, nm), df["Value"])