我有一个如下数据框。
A B C
V1 a 1 e
V2 b 2 f
V3 c 3 g,h,i
V4 d 4 j,k
C列中的数据类是列表。
我想取消列C列数据并在数据帧中生成多行,如下所示。
A B C
V1 a 1 e
V2 b 2 f
V3 c 3 g
V4 c 3 h
V5 c 3 i
V6 d 4 j
V7 d 4 k
怎么做?
非常感谢。
答案 0 :(得分:3)
我们可以使用cSplit
library(splitstackshape)
cSplit(df1, "C", sep=",", "long")
# A B C
#1: a 1 e
#2: b 2 f
#3: c 3 g
#4: c 3 h
#5: c 3 i
#6: d 4 j
#7: d 4 k
或使用unnest
tidyr
library(tidyr)
unnest(df1, C=strsplit(C, ","))
# A B C
#1 a 1 e
#2 b 2 f
#3 c 3 g
#4 c 3 h
#5 c 3 i
#6 d 4 j
#7 d 4 k
或base R
lst <- strsplit(df1$C, ",")
transform(df1[rep(1:nrow(df1), lengths(lst)),-3], C= unlist(lst))
# A B C
#V1 a 1 e
#V2 b 2 f
#V3 c 3 g
#V3.1 c 3 h
#V3.2 c 3 i
#V4 d 4 j
#V4.1 d 4 k
注意:如果&#34; C&#34;列为factor
,转换为character
并在strsplit
中使用,即strsplit(as.character(df1$C), ",")
假设列是&#34; C&#34;是list
,我们仍然可以使用unnest
unnest(df2, C)
# A B C
#1 a 1 e
#2 b 2 f
#3 c 3 g
#4 c 3 h
#5 c 3 i
#6 d 4 j
#7 d 4 k
或listCol_l
splitstackshape
listCol_l(df2, "C")[]
df1 <- structure(list(A = c("a", "b", "c", "d"), B = 1:4, C = c("e",
"f", "g,h,i", "j,k")), .Names = c("A", "B", "C"),
class = "data.frame", row.names = c("V1", "V2", "V3", "V4"))
df2 <- structure(list(A = c("a", "b", "c", "d"), B = 1:4, C = list("e",
"f", c("g", "h", "i"), c("j", "k"))), .Names = c("A", "B",
"C"), row.names = c("V1", "V2", "V3", "V4"), class = "data.frame")
答案 1 :(得分:0)
使用:
s <- strsplit(df$C, split = ",")
data.frame(A = rep(df$A, sapply(s, length)), C = unlist(s))