从R上的数据集连接行,按另一列进行分组

时间:2018-04-17 15:47:27

标签: r loops dataset

我有一个这种格式的数据集:

enter image description here

我需要转换:

enter image description here

我可以不用循环吗?

编辑:

嗨,再次感谢@guscht,我试图使用你的例子,但我有一些问题,我需要在PowerBI上使用它,我的数据集。

我正在尝试

library(data.table)

hec1 <- as.data.table(dataset)
res <- hec1[,strsplit(observaciones, split = ";"),by = c("albaran", "fecha", "cliente", "estado", "descrip", "destinatario", "direccion", "cp", "poblacion")]
res[, tipo_pedido := substring(observaciones, 1, regexpr(":", observaciones)-2)][, entregas := substring(observaciones, regexpr(":", observaciones)+2, nchar(observaciones))]
res$V1 <- NULL
res <- res[,strsplit(entregas, split = ","),by = c("albaran", "fecha", "cliente", "estado", "descrip", "destinatario", "direccion", "cp", "poblacion", "tipo_pedido")]
setnames(res, "tipo_pedido", "entregas")
res

但它不起作用,告诉我这个错误:

Error in strsplit(observaciones, split = ";") : 
  argumento de tipo no-carácter
Calls: [ -> [.data.table -> strsplit
Ejecución interrumpida

我认为......可能是原始格式的问题?,它是一个数据表

1 个答案:

答案 0 :(得分:0)

使用data.table - 包,您可以执行以下操作:

dt <- fread(input = '
16/04/2018 23:18|Estrella Disney|1|sandy crespo
16/04/2018 23:18|Estrella Disney|2|sandy crespo
16/04/2018 23:18|Estrella Disney|3|sandy crespo
16/04/2018 23:18|Estrella Disney|4|sandy crespo
16/04/2018 23:18|Estrella Disney|5|sandy crespo
16/04/2018 23:18|Estrella Disney|6|sandy crespo
16/04/2018 23:18|Colleccion|20|sandy crespo
16/04/2018 23:18|Colleccion|4|sandy crespo
', sep = '|')
setnames(dt, c('date_time', 'something', 'number', 'user'))

res <- dt[, paste(number, collapse = ", "), by = c("something", "user", "date_time")][, paste(something, ":", V1, collapse = "; "), by = c("user", "date_time")]
res <- res[, c('date_time', 'V1', 'user'), with = F]
res
          date_time                                                 V1         user
1: 16/04/2018 23:18 Estrella Disney : 1, 2, 3, 4, 5, 6; Colleccion : 20, 4 sandy crespo

本质上,这种方法使用两个collapse - 语句来生成您想要的列。第一个生成连接的数字,第二个生成带有连接数字的变量。 by - 语句只指定不修改和保留的列。

编辑:我更改了上面的代码,在:和数字之间添加something。 要扭转这一过程,您可以执行以下操作:

res <- res[,strsplit(V1, split = ";"),by = c("user","date_time")]
res[, something := substring(V1, 1, regexpr(":", V1)-2)][, number := substring(V1, regexpr(":", V1)+2, nchar(V1))]
res$V1 <- NULL
res <- res[,strsplit(number, split = ","),by = c("user","date_time","something")]
setnames(res, "V1", "number")
res
           user        date_time       something number
1: sandy crespo 16/04/2018 23:18 Estrella Disney      1
2: sandy crespo 16/04/2018 23:18 Estrella Disney      2
3: sandy crespo 16/04/2018 23:18 Estrella Disney      3
4: sandy crespo 16/04/2018 23:18 Estrella Disney      4
5: sandy crespo 16/04/2018 23:18 Estrella Disney      5
6: sandy crespo 16/04/2018 23:18 Estrella Disney      6
7: sandy crespo 16/04/2018 23:18      Colleccion     20
8: sandy crespo 16/04/2018 23:18      Colleccion      4