使用R合并新列中的行

时间:2016-05-07 14:31:02

标签: r

我有一个如下所示的数据框:

ID  X1  X2  X3  X4  X5  X6  X7  X8
202 1   0   895 114 17  21  1   4
202 2   0   130 399 74  19  4   4
202 3   0   364 112 48  12  5   4
202 4   4   104 012 83  81  0   4
203 1   0   895 112 76  49  1   5
203 2   2   950 815 32  35  4   5
203 3   0   3.4 156 69  14  5   5
203 4   0   868 025 71  20  0   5
204 2   0   801 398 51  44  4   8
204 4   4   205 000 14  24  0   8

我想将对应于ID的数据放在一行中。对于ID," X1"提到了不同数量的行。专栏," X8"列对于ID是相同的," X2"只包含一个非零值,我只对此值感兴趣。如果新列的值不可用,则可以将其设置为999.所以,我想最终看起来像:

ID  X8  X2  X3_1    X4_1    X5_1    X6_1    X7_1    X3_2    X4_2    X5_2    X6_2    X7_2    X3_3    X4_3    X5_3    X6_3    X7_3    X3_4    X4_4    X5_4    X6_4    X7_4                                                                            
202 4   4   895     114     17      21      1       130     399     74      19      4       364     112     48      12      5       104     12      83      81      0
203 5   2   895     112     76      49      1       950     815     32      35      4       3.4     156     69      14      5       868     25      71      20      0
204 8   4   999     999     999     999     999     801     398     51      44      4       999     999     999     999     999     205     0       14      24      0

我希望用R来做这件事。在此先感谢您的帮助。

3 个答案:

答案 0 :(得分:2)

或者使用reshape2:

library(reshape2)
> df.melt = melt(df, id.vars =c("ID", "X1","X2", "X8"))
> df.cast = dcast(df.melt, ID + X8 ~variable + X1 , fill = 999)
> df.cast
   ID X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202  4  895  130  364  104  114  399  112   12   17   74   48   83   21   19   12   81    1    4    5    0
2 203  5  895  950  430  868  112  815  156   25   76   32   69   71   49   35   14   20    1    4    5    0
3 204  8  999  801  999  205  999  398  999    0  999   51  999   14  999   44  999   24  999    4  999    0

如果需要,合并X2

> df.merge = merge(df.cast, df[df$X2!=0,c("ID", "X2")], by="ID", all.x =TRUE)
> df.new =df.merge[, c(1,ncol(df.merge), 2:(ncol(df.merge)-1))]
> df.new
   ID X2 X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4
1 202  4  4  895  130  364  104  114  399  112   12   17   74   48   83   21   19   12   81    1    4    5    0
2 203  2  5  895  950  430  868  112  815  156   25   76   32   69   71   49   35   14   20    1    4    5    0
3 204  4  8  999  801  999  205  999  398  999    0  999   51  999   14  999   44  999   24  999    4  999    0

答案 1 :(得分:1)

我们可以试试

 res <- Reduce(function(...) merge(..., by = c("ID", "X8"),
           all=TRUE), split(df1[-(2:3)], df1$X1))
 res[is.na(res)] <- 999
 res$X2 <- df1$X2[df1$X2!=0]
 colnames(res) <-make.unique(colnames(res))
 res[c(1:2, 23, 3:22)]
 #   ID X8 X2 X3.x X4.x X5.x X6.x X7.x X3.y X4.y X5.y X6.y X7.y X3.x.1 X4.x.1 X5.x.1 X6.x.1 X7.x.1 X3.y.1 X4.y.1 X5.y.1 X6.y.1 X7.y.1
 #1 202  4  4  895  114   17   21    1  130  399   74   19    4  364.0    112     48     12      5    104     12     83     81      0
 #2 203  5  2  895  112   76   49    1  950  815   32   35    4    3.4    156     69     14      5    868     25     71     20      0
 #3 204  8  4  999  999  999  999  999  801  398   51   44    4  999.0    999    999    999    999    205      0     14     24      0

或者我们可以使用dcast中的data.tablevalue.var可以使用多个library(data.table) res1 <- dcast(setDT(df1), ID+X8~X1, value.var = paste0("X", 3:7), fill = 999)[, X2 := df1$X2[df1$X2!=0]] res1 # ID X8 X3_1 X3_2 X3_3 X3_4 X4_1 X4_2 X4_3 X4_4 X5_1 X5_2 X5_3 X5_4 X6_1 X6_2 X6_3 X6_4 X7_1 X7_2 X7_3 X7_4 X2 #1: 202 4 895 130 364.0 104 114 399 112 12 17 74 48 83 21 19 12 81 1 4 5 0 4 #2: 203 5 895 950 3.4 868 112 815 156 25 76 32 69 71 49 35 14 20 1 4 5 0 2 #3: 204 8 999 801 999.0 205 999 398 999 0 999 51 999 14 999 44 999 24 999 4 999 0 4

Flask>=0.10.1                                                                   
Flask-SQLAlchemy>=2.1                                                           
SQLAlchemy>=1.0.12 

答案 2 :(得分:0)

这可以使用基础R reshape()通过(1)将IDX8列视为idvar列,(2)处理{{1} }列作为X1列,(3)在timevar上合并,在重新整形后只有非零ID行,以及(4)在重新整形后用999替换NA:

X2