如何在R中重塑数据帧,以最大值为条件?

时间:2016-04-05 17:42:46

标签: r max reshape

我在R中重新塑造我的数据框时遇到一些困难。我有5个人:A,B,C,D和E.有些人有1个观察,有些有2个。我测量了3个值每个观察:X,Y和Z.我想将我的数据帧从长格式转换为宽格式,每个单独生成一行,标记为X,Y和Z的两组列。但是 ,我想调整X的值,使得最大值为X的观察集首先出现。因此,对于给定的观察,X,Y和Z的值必须保持在一起,但是观察1或2的值是否首先取决于哪个具有最大值X.

df = data.frame(
  indiv = c("A","A","B","C","C","D","D","E"),
  observ = c(1,2,1,1,2,1,2,1),
  X = c(rnorm(8, mean = 10, sd = 6)),
  Y = c(rnorm(8, mean = 0, sd = 2)),
  Z = c(rnorm(8, mean = 4, sd = 4))
)

        indiv   observ  X   Y   Z
1   A   1   9.959043    1.785043    10.134511
2   A   2   14.122006   -2.257666   5.799366
3   B   1   11.562801   -1.394951   4.988923
4   C   1   12.955644   -4.330272   8.870165
5   C   2   13.582154   -1.727224   -7.5617
6   D   1   4.053437    1.815233    1.789157
7   D   2   12.990071   -1.989307   3.67696
8   E   1   2.820895    -3.754263   3.001725

以下是我想要的宽数据框架。对于个体A,在观察2中X更大,因此首先出现一组值(X,Y,Z)。相比之下,对于个体C和D,X在观察1中更大,因此该组首先出现。我认为它应该是重塑函数的一些变化,但我不知道如何调整X的最大值。提前感谢!

        indiv   observ  X   Y   Z   observ  X   Y   Z
1   A   2   18.797087   0.3247862   4.774446    1   8.547868    0.3203667   6.729975
2   B   1   1.646638    0.7986036   6.938825    NA  NA  NA  NA
3   C   1   17.354905   -2.399272   8.357045    2   6.856093    0.6493722   2.420827
4   D   1   16.058101   -1.2370024  4.045489    2   7.641576    3.0820116   4.232615
5   E   1   13.625998   -0.1953445  -5.627932   NA  NA  NA  NA

1 个答案:

答案 0 :(得分:1)

在我投入之前我会先订购。以下使用data.table作为dcast函数也在该包中 - 可以使用普通的data.frame和reshape来完成

library(data.table)
set.seed(1)
df = data.frame(
  indiv = c("A","A","B","C","C","D","D","E"),
  observ = c(1,2,1,1,2,1,2,1),
  X = c(rnorm(8, mean = 10, sd = 6)),
  Y = c(rnorm(8, mean = 0, sd = 2)),
  Z = c(rnorm(8, mean = 4, sd = 4))
)
df
   indiv observ         X           Y         Z
1:     A      2 11.101860 -0.61077677  7.775345
2:     A      1  6.241277  1.15156270  3.935239
3:     B      1  4.986228  3.02356234  7.284885
4:     C      1 19.571685  0.77968647  6.375605
5:     C      2 11.977047 -1.24248116  7.675909
6:     D      2 12.924574  2.24986184  4.298260
7:     D      1  5.077190 -4.42939977  7.128545
8:     E      1 14.429948 -0.08986722 -3.957407

setDT(df)
df <- df[order(indiv,-X)] #orders your frame
df[, observ := as.numeric(1:.N), by = indiv] #reset observ based on new order

df
   indiv observ         X           Y         Z
1:     A      1 11.101860 -0.61077677  7.775345
2:     A      2  6.241277  1.15156270  3.935239
3:     B      1  4.986228  3.02356234  7.284885
4:     C      1 19.571685  0.77968647  6.375605
5:     C      2 11.977047 -1.24248116  7.675909
6:     D      1 12.924574  2.24986184  4.298260
7:     D      2  5.077190 -4.42939977  7.128545
8:     E      1 14.429948 -0.08986722 -3.957407

现在正常施放:

dcast(df, indiv ~ observ, value.var = c("X","Y","Z"))

   indiv       X_1       X_2         Y_1       Y_2       Z_1      Z_2
1:     A 11.101860  6.241277 -0.61077677  1.151563  7.775345 3.935239
2:     B  4.986228        NA  3.02356234        NA  7.284885       NA
3:     C 19.571685 11.977047  0.77968647 -1.242481  6.375605 7.675909
4:     D 12.924574  5.077190  2.24986184 -4.429400  4.298260 7.128545
5:     E 14.429948        NA -0.08986722        NA -3.957407       NA

要获得您想要的列顺序,我认为您需要融化然后投射:

dcast(melt(df, id.vars = c("indiv","observ")), indiv ~ observ + variable)
   indiv       1_X         1_Y       1_Z       2_X       2_Y      2_Z
1:     A 11.101860 -0.61077677  7.775345  6.241277  1.151563 3.935239
2:     B  4.986228  3.02356234  7.284885        NA        NA       NA
3:     C 19.571685  0.77968647  6.375605 11.977047 -1.242481 7.675909
4:     D 12.924574  2.24986184  4.298260  5.077190 -4.429400 7.128545
5:     E 14.429948 -0.08986722 -3.957407        NA        NA       NA