为Sankey Diagram构建事务数据

时间:2019-03-23 01:24:10

标签: r plyr sankey-diagram

Sankey图有很多软件包。但是,这些程序包假设数据已经结构化。我正在查看一个交易数据集,我想在该数据集中提取时间序列中的第一批产品。假设时间序列已订购。

这是数据集:

string = "when you convert {} to celcius the result is {:f}".format(num,celciusTemp)
print(string) 

image

这是所需的输出:

image1

1 个答案:

答案 0 :(得分:0)

这是我的建议:

dt <-structure(list(date = structure(c(1546300800, 1546646400, 1547510400, 1547596800, 1546387200, 1546646400, 1546732800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
               client = c("a", "a", "a", "a", "b", "b", "b"),
                          product = c("butter", "cheese", "cheese", "butter", "milk", "garbage bag", "candy"),
               qty = c(2, 3, 4, 1, 3, 4, 6)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))

library(data.table)
library(stringr)
dt <- as.data.table(dt)
dt[, From:=shift(product,type = "lag"), by=client]
dt <- dt[!is.na(From)]

setnames(dt, "product", "To")
dt <- dt[From!=To]
setcolorder(dt, c("client", "From", "To", "qty"))
dt[, comp:=paste0(sort(c(From, To)), collapse = "_"), by=seq_len(nrow(dt))]
dt <- unique(dt, by="comp")

dt[, date:=NULL]
dt[, comp:=NULL]

一个警告:为什么删除奶酪之间的奶酪?我假设您正在寻找不同产品的序列。如果出于其他原因,我的代码可能需要进行一些调整。

#  client        From          To qty       
#      a      butter      cheese   3 
#      b        milk garbage bag   4 
#      b garbage bag       candy   6