我有一个如下所示的数据框:
ID X1 X2 X3 X4 X5 X6 X7 X8
202 1 0 895 114 17 21 1 4
202 2 0 130 399 74 19 4 4
202 3 0 364 112 48 12 5 4
202 4 4 104 012 83 81 0 4
203 1 0 895 112 76 49 1 5
203 2 2 950 815 32 35 4 5
203 3 0 3.4 156 69 14 5 5
203 4 0 868 025 71 20 0 5
204 2 0 801 398 51 44 4 8
204 4 4 205 000 14 24 0 8
我想将对应于ID的数据放在一行中。对于ID," X1"提到了不同数量的行。专栏," X8"列对于ID是相同的," X2"只包含一个非零值,我只对此值感兴趣。如果新列的值不可用,则可以将其设置为999.所以,我想最终看起来像:
ID X8 X2 X3_1 X4_1 X5_1 X6_1 X7_1 X3_2 X4_2 X5_2 X6_2 X7_2 X3_3 X4_3 X5_3 X6_3 X7_3 X3_4 X4_4 X5_4 X6_4 X7_4
202 4 4 895 114 17 21 1 130 399 74 19 4 364 112 48 12 5 104 12 83 81 0
203 5 2 895 112 76 49 1 950 815 32 35 4 3.4 156 69 14 5 868 25 71 20 0
204 8 4 999 999 999 999 999 801 398 51 44 4 999 999 999 999 999 205 0 14 24 0
我希望用R来做这件事。在此先感谢您的帮助。
答案 0 :(得分:2)
或者使用reshape2:
library(reshape2)
> df.melt = melt(df, id.vars =c("ID", "X1","X2", "X8"))
> df.cast = dcast(df.melt, ID + X8 ~variable + X1 , fill = 999)
> df.cast
ID X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202 4 895 130 364 104 114 399 112 12 17 74 48 83 21 19 12 81 1 4 5 0
2 203 5 895 950 430 868 112 815 156 25 76 32 69 71 49 35 14 20 1 4 5 0
3 204 8 999 801 999 205 999 398 999 0 999 51 999 14 999 44 999 24 999 4 999 0
如果需要,合并X2
> df.merge = merge(df.cast, df[df$X2!=0,c("ID", "X2")], by="ID", all.x =TRUE)
> df.new =df.merge[, c(1,ncol(df.merge), 2:(ncol(df.merge)-1))]
> df.new
ID X2 X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202 4 4 895 130 364 104 114 399 112 12 17 74 48 83 21 19 12 81 1 4 5 0
2 203 2 5 895 950 430 868 112 815 156 25 76 32 69 71 49 35 14 20 1 4 5 0
3 204 4 8 999 801 999 205 999 398 999 0 999 51 999 14 999 44 999 24 999 4 999 0
答案 1 :(得分:1)
我们可以试试
res <- Reduce(function(...) merge(..., by = c("ID", "X8"),
all=TRUE), split(df1[-(2:3)], df1$X1))
res[is.na(res)] <- 999
res$X2 <- df1$X2[df1$X2!=0]
colnames(res) <-make.unique(colnames(res))
res[c(1:2, 23, 3:22)]
# ID X8 X2 X3.x X4.x X5.x X6.x X7.x X3.y X4.y X5.y X6.y X7.y X3.x.1 X4.x.1 X5.x.1 X6.x.1 X7.x.1 X3.y.1 X4.y.1 X5.y.1 X6.y.1 X7.y.1
#1 202 4 4 895 114 17 21 1 130 399 74 19 4 364.0 112 48 12 5 104 12 83 81 0
#2 203 5 2 895 112 76 49 1 950 815 32 35 4 3.4 156 69 14 5 868 25 71 20 0
#3 204 8 4 999 999 999 999 999 801 398 51 44 4 999.0 999 999 999 999 205 0 14 24 0
或者我们可以使用dcast
中的data.table
,value.var
可以使用多个library(data.table)
res1 <- dcast(setDT(df1), ID+X8~X1, value.var = paste0("X", 3:7),
fill = 999)[, X2 := df1$X2[df1$X2!=0]]
res1
# ID X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4 X2
#1: 202 4 895 130 364.0 104 114 399 112 12 17 74 48 83 21 19 12 81 1 4 5 0 4
#2: 203 5 895 950 3.4 868 112 815 156 25 76 32 69 71 49 35 14 20 1 4 5 0 2
#3: 204 8 999 801 999.0 205 999 398 999 0 999 51 999 14 999 44 999 24 999 4 999 0 4
列
Flask>=0.10.1
Flask-SQLAlchemy>=2.1
SQLAlchemy>=1.0.12
答案 2 :(得分:0)
这可以使用基础R reshape()
通过(1)将ID
和X8
列视为idvar
列,(2)处理{{1} }列作为X1
列,(3)在timevar
上合并,在重新整形后只有非零ID
行,以及(4)在重新整形后用999替换NA:
X2