我有一个包含3列的数据框:
A <- c("stringA", "stringA", "stringB", "stringB")
B <- c(1, 2, 1, 2)
C <- c("abcd", "abcd", "abcde", "bbc")
df <- data.frame(A, B, C)
> test
A B C
1 stringA 1 abcd
2 stringA 2 abcd
3 stringB 1 abcde
4 stringB 2 bbc
我想重新格式化,以便列B成为行名称,列C中的值被拆分为单个字母以获取:
A 1 2
stringA a a
stringA b b
stringA c c
stringA d d
stringB a b
stringB b b
stringB c c
stringB d NA
stringB e NA
答案 0 :(得分:3)
这是使用“data.table”和“reshape2”的方法。确保您首先使用的是“data.table”软件包的至少1.8.11版本。
library(reshape2)
library(data.table)
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(df, key="A,B")
DT <- DT[, list(C = unlist(strsplit(as.character(C), ""))), by = key(DT)]
DT[, N := sequence(.N), by = key(DT)]
dcast.data.table(DT, A + N ~ B, value.var="C")
# A N 1 2
# 1: stringA 1 a a
# 2: stringA 2 b b
# 3: stringA 3 c c
# 4: stringA 4 d d
# 5: stringB 1 a b
# 6: stringB 2 b b
# 7: stringB 3 c c
# 8: stringB 4 d NA
# 9: stringB 5 e NA
如果您更喜欢坚持使用基础R,那么这种方法有点类似:
## Split the "C" column up
X <- strsplit(as.character(df$C), "")
## "Expand" your data.frame
df2 <- df[rep(seq_along(X), sapply(X, length)), ]
## Create an additional "id"
df2$id <- with(df2, ave(as.character(A), A, B, FUN = seq_along))
## Replace your "C" values
df2$C <- unlist(X)
## Reshape your data
reshape(df2, direction = "wide", idvar=c("A", "id"), timevar="B")
# A id C.1 C.2
# 1 stringA 1 a a
# 1.1 stringA 2 b b
# 1.2 stringA 3 c c
# 1.3 stringA 4 d d
# 3 stringB 1 a b
# 3.1 stringB 2 b b
# 3.2 stringB 3 c c
# 3.3 stringB 4 d <NA>
# 3.4 stringB 5 e <NA>