我有一个这种格式的数据集:
我需要转换:
我可以不用循环吗?
编辑:
嗨,再次感谢@guscht,我试图使用你的例子,但我有一些问题,我需要在PowerBI上使用它,我的数据集。
我正在尝试
library(data.table)
hec1 <- as.data.table(dataset)
res <- hec1[,strsplit(observaciones, split = ";"),by = c("albaran", "fecha", "cliente", "estado", "descrip", "destinatario", "direccion", "cp", "poblacion")]
res[, tipo_pedido := substring(observaciones, 1, regexpr(":", observaciones)-2)][, entregas := substring(observaciones, regexpr(":", observaciones)+2, nchar(observaciones))]
res$V1 <- NULL
res <- res[,strsplit(entregas, split = ","),by = c("albaran", "fecha", "cliente", "estado", "descrip", "destinatario", "direccion", "cp", "poblacion", "tipo_pedido")]
setnames(res, "tipo_pedido", "entregas")
res
但它不起作用,告诉我这个错误:
Error in strsplit(observaciones, split = ";") :
argumento de tipo no-carácter
Calls: [ -> [.data.table -> strsplit
Ejecución interrumpida
我认为......可能是原始格式的问题?,它是一个数据表
答案 0 :(得分:0)
使用data.table
- 包,您可以执行以下操作:
dt <- fread(input = '
16/04/2018 23:18|Estrella Disney|1|sandy crespo
16/04/2018 23:18|Estrella Disney|2|sandy crespo
16/04/2018 23:18|Estrella Disney|3|sandy crespo
16/04/2018 23:18|Estrella Disney|4|sandy crespo
16/04/2018 23:18|Estrella Disney|5|sandy crespo
16/04/2018 23:18|Estrella Disney|6|sandy crespo
16/04/2018 23:18|Colleccion|20|sandy crespo
16/04/2018 23:18|Colleccion|4|sandy crespo
', sep = '|')
setnames(dt, c('date_time', 'something', 'number', 'user'))
res <- dt[, paste(number, collapse = ", "), by = c("something", "user", "date_time")][, paste(something, ":", V1, collapse = "; "), by = c("user", "date_time")]
res <- res[, c('date_time', 'V1', 'user'), with = F]
res
date_time V1 user
1: 16/04/2018 23:18 Estrella Disney : 1, 2, 3, 4, 5, 6; Colleccion : 20, 4 sandy crespo
本质上,这种方法使用两个collapse
- 语句来生成您想要的列。第一个生成连接的数字,第二个生成带有连接数字的变量。 by
- 语句只指定不修改和保留的列。
编辑:我更改了上面的代码,在:
和数字之间添加something
。
要扭转这一过程,您可以执行以下操作:
res <- res[,strsplit(V1, split = ";"),by = c("user","date_time")]
res[, something := substring(V1, 1, regexpr(":", V1)-2)][, number := substring(V1, regexpr(":", V1)+2, nchar(V1))]
res$V1 <- NULL
res <- res[,strsplit(number, split = ","),by = c("user","date_time","something")]
setnames(res, "V1", "number")
res
user date_time something number
1: sandy crespo 16/04/2018 23:18 Estrella Disney 1
2: sandy crespo 16/04/2018 23:18 Estrella Disney 2
3: sandy crespo 16/04/2018 23:18 Estrella Disney 3
4: sandy crespo 16/04/2018 23:18 Estrella Disney 4
5: sandy crespo 16/04/2018 23:18 Estrella Disney 5
6: sandy crespo 16/04/2018 23:18 Estrella Disney 6
7: sandy crespo 16/04/2018 23:18 Colleccion 20
8: sandy crespo 16/04/2018 23:18 Colleccion 4