部分转置R中的数据帧

时间:2013-10-15 15:09:40

标签: r reshape

给出以下数据集:

 transect <- c("B","N","C","D","H","J","E","L","I","I")
 sampler <- c(rep("J",5),rep("W",5))
 species <- c("ROB","HAW","HAW","ROB","PIG","HAW","PIG","PIG","HAW","HAW")
 weight <- c(2.80,52.00,56.00,2.80,16.00,55.00,16.20,18.30,52.50,57.00)
 wingspan <- c(13.9, 52.0, 57.0, 13.7, 11.0,52.5, 10.7, 11.1, 52.3, 55.1)
 week <- c(1,2,3,4,5,6,7,8,9,9)
 # Warning to R newbs: Really bad idea to use this code
 ex <- as.data.frame(cbind(transect,sampler,species,weight,wingspan,week))

我想要实现的是将物种及其相关信息转换为重量和翼展。为了更好地了解预期结果,请参阅下文。我的数据集大约有五十万行,大约有200种不同的物种,因此它将是一个非常大的数据帧。

      transect sampler week ROBweight HAWweight PIGweight ROBwingspan HAWwingspan PIGwingspan
1         B       J    1       2.8       0.0       0.0        13.9         0.0         0.0
2         N       J    2       0.0      52.0       0.0         0.0        52.0         0.0
3         C       J    3       0.0      56.0       0.0         0.0        57.0         0.0
4         D       J    4       2.8       0.0       0.0        13.7         0.0         0.0
5         H       J    5       0.0       0.0      16.0         0.0         0.0        11.0
6         J       W    6       0.0      55.0       0.0         0.0        52.5         0.0
7         E       W    7       0.0       0.0      16.2         0.0         0.0        10.7
8         L       W    8       0.0       0.0      18.3         0.0         0.0        11.1
9         I       W    9       0.0      52.5       0.0         0.0        52.3         0.0
10        I       W    9       0.0      57.0       0.0         0.0        55.1         0.0

1 个答案:

答案 0 :(得分:3)

主要问题是您目前没有唯一的“id”变量,这会为reshapedcast的常见嫌疑人带来问题。

这是一个解决方案。我使用了我的“splitstackshape”包中的getanID,但使用许多不同的方法创建自己的唯一ID变量非常容易。

library(splitstackshape)
library(reshape2)
idvars <- c("transect", "sampler", "week")
ex <- getanID(ex, id.vars=idvars)

从这里,您有两种选择:

来自基地R的

reshape

reshape(ex, direction = "wide", 
        idvar=c("transect", "sampler", "week", ".id"), 
        timevar="species")
来自“reshape2”的

meltdcast

首先,melt将您的数据转换为“长”形式。

exL <- melt(ex, id.vars=c(idvars, ".id", "species"))

然后,cast将您的数据转换为广泛的形式。

dcast(exL, transect + sampler + week + .id ~ species + variable)
#    transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
# 1         B       J    1   1         NA           NA         NA           NA        2.8         13.9
# 2         C       J    3   1       56.0         57.0         NA           NA         NA           NA
# 3         D       J    4   1         NA           NA         NA           NA        2.8         13.7
# 4         E       W    7   1         NA           NA       16.2         10.7         NA           NA
# 5         H       J    5   1         NA           NA       16.0         11.0         NA           NA
# 6         I       W    9   1       52.5         52.3         NA           NA         NA           NA
# 7         I       W    9   2       57.0         55.1         NA           NA         NA           NA
# 8         J       W    6   1       55.0         52.5         NA           NA         NA           NA
# 9         L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10        N       J    2   1       52.0         52.0         NA           NA         NA           NA

更好的选择:“data.table”

或者(也许最好),您可以使用“data.table”包(至少版本1.8.11),如下所示:

library(data.table)
library(reshape2) ## Also required here
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(ex)
DT[, .id := sequence(.N), by = c("transect", "sampler", "week")]
DTL <- melt(DT, measure.vars=c("weight", "wingspan"))
dcast.data.table(DTL, transect + sampler + week + .id ~ species + variable)
#     transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
#  1:        B       J    1   1         NA           NA         NA           NA        2.8         13.9
#  2:        C       J    3   1       56.0         57.0         NA           NA         NA           NA
#  3:        D       J    4   1         NA           NA         NA           NA        2.8         13.7
#  4:        E       W    7   1         NA           NA       16.2         10.7         NA           NA
#  5:        H       J    5   1         NA           NA       16.0         11.0         NA           NA
#  6:        I       W    9   1       52.5         52.3         NA           NA         NA           NA
#  7:        I       W    9   2       57.0         55.1         NA           NA         NA           NA
#  8:        J       W    6   1       55.0         52.5         NA           NA         NA           NA
#  9:        L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10:        N       J    2   1       52.0         52.0         NA           NA         NA           NA

fill = 0添加到dcast版本之一,将NA值替换为0。