R中的展平列表列

时间:2017-02-15 00:08:42

标签: r csv

我在数据框中有一个列,我使用colsplit将其拆分为三个单独的列。

df <- transform(df, concatenation = colsplit(concatenation, pattern="->-",
 names = c('att1', 'att2','att3', 'att4')))

OR

df$concatenation  <- colsplit(concatenation, pattern="->-",
 names = c('att1', 'att2','att3', 'att4')))

concatenation 
a->-a->-b->-c
b->-a->-b->-d
3->-a->-x->-c
2->-a->-y->-8

现在我有以下列,concatenation.att1,concatenation.att2等。

concatenation.att1 concatenation.att2 concatenation.att3 concatenation.att4
a                  a                  b                  c
b                  a                  b                  d
3                  a                  x                  c
2                  a                  y                  8

尝试将此数据框导出为CSV时,出现以下错误:

Error in ncol(xj) : object 'xj' not found

OR

Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
  missing value where TRUE/FALSE needed

从研究中我推断出这是来自我的嵌套列,但是我找不到一种简单的方法来展平数据帧(如下所示)以导出到csv。

att1 att2 att3 att4
a    a    b    c
b    a    b    d
3    a    x    c
2    a    y    8

目前我正在将数据重新分配到适当的级别并删除堆积列,但我相信有更好的方法可以做到这一点。

df$att1 <- df$concatenation$att1
df$att2 <- df$concatenation$att2
df$att3 <- df$concatenation$att3
df$att4 <- df$concatenation$att4

df$concatenation <- NULL

以下是一个可重现的例子:

#read in table
df <- read.table(textConnection(
  "concatenation     Value
AFG->-Afghanistan->-1950->-True    20,249
  AFG->-Afghanistan->-1951->-True    21,352
  AFG->-Afghanistan->-1952->-True    22,532
  AFG->-Afghanistan->-1953->-True    23,557
  AFG->-Afghanistan->-1954->-True    24,555
  ALB->-Albania->-1950->-True    8,097
  ALB->-Albania->-1951->-True    8,986"), header=TRUE)

#Split concatenation var
df <- transform(df, concatenation = colsplit(concatenation, pattern="->-",
                                             names = c('att1', 'att2','att3', 'att4')))
#write to csv
write.csv(df, "myfile.csv")

2 个答案:

答案 0 :(得分:1)

为什么你需要转变?试试这个:

df$concatenation <- colsplit(df$concatenation, "->-",
                    names = c("att1", "att2","att3", "att4"))

答案 1 :(得分:1)

看起来tidyr::separate会这样做。

nm <- c('att1', 'att2','att3', 'att4')
df2 <- tidyr::separate(df, concatenation, nm, sep = "->-")

sapply(df2, typeof)
#        att1        att2        att3        att4       Value 
# "character" "character" "character" "character"   "integer" 
write.csv(df2)
# "","att1","att2","att3","att4","Value"
# "1","AFG","Afghanistan","1950","True","20,249"
# "2","AFG","Afghanistan","1951","True","21,352"
# "3","AFG","Afghanistan","1952","True","22,532"
# "4","AFG","Afghanistan","1953","True","23,557"
# "5","AFG","Afghanistan","1954","True","24,555"
# "6","ALB","Albania","1950","True","8,097"
# "7","ALB","Albania","1951","True","8,986"

在基数R中,strsplit()将起作用。

df3 <- do.call(rbind.data.frame, strsplit(as.character(df$concatenation), "->-"))
cbind(setNames(df3, nm), df["Value"])