通过展开现有列的元素,将数据框重新整形为长格式

时间:2014-02-27 16:43:49

标签: r reshape reshape2

我有一个包含3列的数据框:

A <- c("stringA", "stringA", "stringB", "stringB")
B <- c(1, 2, 1, 2)
C <- c("abcd", "abcd", "abcde", "bbc")

df <- data.frame(A, B, C)

> test
        A B     C
1 stringA 1  abcd
2 stringA 2  abcd
3 stringB 1 abcde
4 stringB 2   bbc

我想重新格式化,以便列B成为行名称,列C中的值被拆分为单个字母以获取:

A    1    2   
stringA    a    a
stringA    b    b
stringA    c    c
stringA    d    d
stringB    a    b
stringB    b    b
stringB    c    c
stringB    d    NA
stringB    e    NA

1 个答案:

答案 0 :(得分:3)

这是使用“data.table”和“reshape2”的方法。确保您首先使用的是“data.table”软件包的至少1.8.11版本。

library(reshape2)
library(data.table)
packageVersion("data.table")
# [1] ‘1.8.11’

DT <- data.table(df, key="A,B")
DT <- DT[, list(C = unlist(strsplit(as.character(C), ""))), by = key(DT)]
DT[, N := sequence(.N), by = key(DT)]
dcast.data.table(DT, A + N ~ B, value.var="C")
#          A N 1  2
# 1: stringA 1 a  a
# 2: stringA 2 b  b
# 3: stringA 3 c  c
# 4: stringA 4 d  d
# 5: stringB 1 a  b
# 6: stringB 2 b  b
# 7: stringB 3 c  c
# 8: stringB 4 d NA
# 9: stringB 5 e NA

如果您更喜欢坚持使用基础R,那么这种方法有点类似:

## Split the "C" column up
X <- strsplit(as.character(df$C), "")

## "Expand" your data.frame
df2 <- df[rep(seq_along(X), sapply(X, length)), ]

## Create an additional "id"
df2$id <- with(df2, ave(as.character(A), A, B, FUN = seq_along))

## Replace your "C" values
df2$C <- unlist(X)

## Reshape your data
reshape(df2, direction = "wide", idvar=c("A", "id"), timevar="B")
#           A id C.1  C.2
# 1   stringA  1   a    a
# 1.1 stringA  2   b    b
# 1.2 stringA  3   c    c
# 1.3 stringA  4   d    d
# 3   stringB  1   a    b
# 3.1 stringB  2   b    b
# 3.2 stringB  3   c    c
# 3.3 stringB  4   d <NA>
# 3.4 stringB  5   e <NA>