我在R中重新塑造我的数据框时遇到一些困难。我有5个人:A,B,C,D和E.有些人有1个观察,有些有2个。我测量了3个值每个观察:X,Y和Z.我想将我的数据帧从长格式转换为宽格式,每个单独生成一行,标记为X,Y和Z的两组列。但是 ,我想调整X的值,使得最大值为X的观察集首先出现。因此,对于给定的观察,X,Y和Z的值必须保持在一起,但是观察1或2的值是否首先取决于哪个具有最大值X.
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
indiv observ X Y Z
1 A 1 9.959043 1.785043 10.134511
2 A 2 14.122006 -2.257666 5.799366
3 B 1 11.562801 -1.394951 4.988923
4 C 1 12.955644 -4.330272 8.870165
5 C 2 13.582154 -1.727224 -7.5617
6 D 1 4.053437 1.815233 1.789157
7 D 2 12.990071 -1.989307 3.67696
8 E 1 2.820895 -3.754263 3.001725
以下是我想要的宽数据框架。对于个体A,在观察2中X更大,因此首先出现一组值(X,Y,Z)。相比之下,对于个体C和D,X在观察1中更大,因此该组首先出现。我认为它应该是重塑函数的一些变化,但我不知道如何调整X的最大值。提前感谢!
indiv observ X Y Z observ X Y Z
1 A 2 18.797087 0.3247862 4.774446 1 8.547868 0.3203667 6.729975
2 B 1 1.646638 0.7986036 6.938825 NA NA NA NA
3 C 1 17.354905 -2.399272 8.357045 2 6.856093 0.6493722 2.420827
4 D 1 16.058101 -1.2370024 4.045489 2 7.641576 3.0820116 4.232615
5 E 1 13.625998 -0.1953445 -5.627932 NA NA NA NA
答案 0 :(得分:1)
在我投入之前我会先订购。以下使用data.table
作为dcast函数也在该包中 - 可以使用普通的data.frame和reshape
来完成
library(data.table)
set.seed(1)
df = data.frame(
indiv = c("A","A","B","C","C","D","D","E"),
observ = c(1,2,1,1,2,1,2,1),
X = c(rnorm(8, mean = 10, sd = 6)),
Y = c(rnorm(8, mean = 0, sd = 2)),
Z = c(rnorm(8, mean = 4, sd = 4))
)
df
indiv observ X Y Z
1: A 2 11.101860 -0.61077677 7.775345
2: A 1 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 2 12.924574 2.24986184 4.298260
7: D 1 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
setDT(df)
df <- df[order(indiv,-X)] #orders your frame
df[, observ := as.numeric(1:.N), by = indiv] #reset observ based on new order
df
indiv observ X Y Z
1: A 1 11.101860 -0.61077677 7.775345
2: A 2 6.241277 1.15156270 3.935239
3: B 1 4.986228 3.02356234 7.284885
4: C 1 19.571685 0.77968647 6.375605
5: C 2 11.977047 -1.24248116 7.675909
6: D 1 12.924574 2.24986184 4.298260
7: D 2 5.077190 -4.42939977 7.128545
8: E 1 14.429948 -0.08986722 -3.957407
现在正常施放:
dcast(df, indiv ~ observ, value.var = c("X","Y","Z"))
indiv X_1 X_2 Y_1 Y_2 Z_1 Z_2
1: A 11.101860 6.241277 -0.61077677 1.151563 7.775345 3.935239
2: B 4.986228 NA 3.02356234 NA 7.284885 NA
3: C 19.571685 11.977047 0.77968647 -1.242481 6.375605 7.675909
4: D 12.924574 5.077190 2.24986184 -4.429400 4.298260 7.128545
5: E 14.429948 NA -0.08986722 NA -3.957407 NA
要获得您想要的列顺序,我认为您需要融化然后投射:
dcast(melt(df, id.vars = c("indiv","observ")), indiv ~ observ + variable)
indiv 1_X 1_Y 1_Z 2_X 2_Y 2_Z
1: A 11.101860 -0.61077677 7.775345 6.241277 1.151563 3.935239
2: B 4.986228 3.02356234 7.284885 NA NA NA
3: C 19.571685 0.77968647 6.375605 11.977047 -1.242481 7.675909
4: D 12.924574 2.24986184 4.298260 5.077190 -4.429400 7.128545
5: E 14.429948 -0.08986722 -3.957407 NA NA NA