获取最后两列中的所有数字数据,其位置因行而异

时间:2016-03-22 16:25:08

标签: r data.table

我的样本数据如下:

dt1 <- setDT(structure(list(V1 = c(301L, 301L, 301L, 301L, 301L), V2 = 1:5, 
    V3 = c(61950L, 61951L, 61953L, 155220L, 155218L), V4 = c("i", 
    "you", "you", "we", "they"), V5 = c("believe", "think", "are", 
    "laugh", "smile"), V6 = c("we", "they", "okay", "490", "490"
    ), V7 = c("can", "500", "with", "31", "31"), V8 = c("use", 
    "32", "that", "", ""), V9 = c("datatable", "", "500", "", 
    ""), V10 = c("always", "", "32", "", ""), V11 = c("500", 
    "", "", "", ""), V12 = c("32", "", "", "", "")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12"), row.names = c(NA, -5L), class = "data.frame"))

   V1 V2     V3   V4      V5   V6   V7   V8        V9    V10 V11 V12
1: 301  1  61950    i believe   we  can  use datatable always 500  32
2: 301  2  61951  you   think they  500   32                         
3: 301  3  61953  you     are okay with that       500     32        
4: 301  4 155220   we   laugh  490   31                              
5: 301  5 155218 they   smile  490   31  

我希望它转换为以下内容:

    V1 V2     V3                               newcol1 newcol2 newcol3
1: 301  1  61950 I believe we can use datatable always     500      32
2: 301  2  61951                        you think they     500      32
3: 301  3  61953                you are okay with that     500      32
4: 301  4 155220                              we laugh     490      31
5: 301  5 155218                            they smile     490      31

机制:

  • a)样本数据中的列V1,V2和V3始终为数字,并且为
    在样本输出中保持不变
  • b)样本中的最后两列 数据始终是数字,但最后两列的位置对于每一行都不同:在上面的示例数据中,第1行将V11和V12作为最终列,第2行将V7和V8作为最后两列
  • c)在前三个数字列和最后两列之间 在样本数据中,存在文本数据:例如,在行1中,列V4:V10 是所有文本,在第2行,V4:V6始终是文本
  • d)没有任何细胞 数据是空白的
  • e)样本输出必须与样本数据具有相同的前三列;示例输出中的newcol1仅合并该行的文本列
  • f)示例输出中的newcol2和newcol3始终是每行的最后两个数值(请注意,列位置不同 行)

请仅建议data.table解决方案。

1 个答案:

答案 0 :(得分:4)

你可以做到

rowid_vars = c("V1","V2","V3")
melt(dt1, id=rowid_vars)[value!="", .(
  nc1 = paste(value[-(.N-1:0)], collapse=" "), 
  nc2 = as.integer(value[.N-1]), 
  nc3 = as.integer(value[.N])
), by=rowid_vars]


    V1 V2     V3                                   nc1 nc2 nc3
1: 301  1  61950 i believe we can use datatable always 500  32
2: 301  2  61951                        you think they 500  32
3: 301  3  61953                you are okay with that 500  32
4: 301  4 155220                              we laugh 490  31
5: 301  5 155218                            they smile 490  31

我想有一些方法可以读取数据以避免这个问题,但我不知道如何。