Question

我正试图用R中的Sankey Diagram可视化我的数据流。

我发现this blog post链接到生成Sankey图的R脚本，遗憾的是它非常原始且有些局限（请参阅下面的示例代码和数据）。

有没有人知道其他脚本 - 或者甚至是一个更开发的包？我的最终目标是通过图表组件的相对大小来显示数据流和百分比，例如these examples of Sankey Diagrams。

我发布了a somewhat similar question on the r-help list，但两周后没有任何回复，我在堆栈溢出时试试运气。

谢谢，埃里克

PS。我知道Parallel Sets Plot，但这不是我想要的。

# thanks to, https://tonybreyal.wordpress.com/2011/11/24/source_https-sourcing-an-r-script-from-github/
  sourc.https     <- function(url, ...) {
# install and load the RCurl package 
if (match('RCurl', nomatch=0, installed.packages()[,1])==0) {
  install.packages(c("RCurl"), dependencies = TRUE)
  require(RCurl)  
} else require(RCurl)    

# parse and evaluate each .R script
  sapply(c(url, ...), function(u) {
    eval(parse(text = getURL(u, followlocation = TRUE, 
    cainfo  = system.file("CurlSSL", "cacert.pem", 
    package = "RCurl"))), envir = .GlobalEnv)
 } )
 }

# from https://gist.github.com/1423501
sourc.https("https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/Sankey.R")

# My example (there is another example inside Sankey.R):
inputs = c(6, 144)
losses = c(6,47,14,7, 7, 35, 34)
unit = "n ="

labels = c("Transfers",
           "Referrals\n",
           "Unable to Engage",
           "Consultation only",
           "Did not complete the intake",
           "Did not engage in Treatment",
           "Discontinued Mid-Treatment",
           "Completed Treatment",
           "Active in \nTreatment")

SankeyR(inputs,losses,unit,labels)

# Clean up my mess
rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit")

使用上面的代码 Sankey Diagram produced with the code above

生成的Sankey图表

Answer 1

可以通过networkD3包创建此图。它允许您创建交互式sankey图表。在这里你可以找到example。我还添加了一个截图，让你知道它的样子。

# Load package
library(networkD3)

# Load energy projection data
# Load energy projection data
URL <- paste0(
        "https://cdn.rawgit.com/christophergandrud/networkD3/",
        "master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
             Target = "target", Value = "value", NodeID = "name",
             units = "TWh", fontSize = 12, nodeWidth = 30)

enter image description here

Answer 2

我创建了一个包（riverplot），与Sankey函数相比，它具有略微不同但重叠的功能，并且可以生成如下图：

enter image description here

Answer 3

如果你想用R做，你的最佳出价似乎是@Roman建议 - 破解SankeyR功能。例如 - 下面是我非常快速的修复 - 只需将标签定向为verticaly，slighlty偏移它们并减少输入referals的字体以使其看起来更好一些。此修改仅更改SankeyR函数中的第171和223行：

    #line171 - change oversized font size of input label
    fontsize = max(0.5,frInputs[j]*1.5)#1.5 instead of 2.5 

    #line223 - srt changes from 35 to 90 to orient labels vertically, 
    #and offset adjusts them to get better alignment with arrows
    text(txtX, txtY, fullLabel, cex=fontsize, pos=4, srt=90, offset=0.1)

enter image description here

我不是三角学的王牌，但这确实是你改变箭头方向所需要的。在我看来这是理想的 - 如果你可以调整松动箭头，使它们水平定向而不是垂直定向。否则，为什么我的解决方案解决了标签方向的问题，它不会使图表更具可读性......

Answer 4

除了rCharts之外，现在还可以使用googleVis（版本＆gt; = 0.5.0）在R中生成 Sankey 图表。例如，此post描述了使用googleVis生成以下图表： enter image description here

Answer 5

R＆＃39; alluvial包也会执行此操作（来自?alluvial）。

# install.packages(c("alluvial"), dependencies = TRUE)
require(alluvial)

# Titanic data
tit <- as.data.frame(Titanic)

# 4d
alluvial( tit[,1:4], freq=tit$Freq, border=NA,
     hide = tit$Freq < quantile(tit$Freq, .50),
     col=ifelse( tit$Class == "3rd" & tit$Sex == "Male", "red", "gray") )

Answer 6

plotly与networkD3包（example link）功能相同。

Answer 7

根据these definitions判断，此函数与并行集图一样，缺乏分割和组合流的能力（即通过多个转换）。

自Sankey diagrams are directed weighted graphs起，像qgraph这样的包可能会有用。

SankeyR函数提供更清晰的标签，如果您按照降序对损失进行排序，因为文本放置在靠近箭头的位置而不会重叠。

Answer 8

查看//sankeybuilder.com因为它提供了随时可用的解决方案，您可以随时上传数据和播放变体。转换效果很好（类似于您问题中的youtube演示）。如果您加载SankeyTrend演示，它包含许多时间段（数据年数）。加载后（自动构建sankeys），单击页面右上角的播放按钮以播放时隙，甚至可以暂停和恢复时间。演示网址在这里：SankeyTrend希望这有助于您寻找完美的Sankey图表。

Answer 9

为完整性起见，还提供了ggalluvial软件包，它是用于冲积/桑基图的# devtools::install_github("corybrunson/ggalluvial", ref = "optimization") library(ggalluvial) titanic_wide <- data.frame(Titanic) ggplot(data = titanic_wide, aes(axis1 = Class, axis2 = Sex, axis3 = Age, y = Freq)) + scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) + xlab("Demographic") + geom_alluvium(aes(fill = Survived)) + geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) + theme_minimal() + ggtitle("passengers on the maiden voyage of the Titanic", "stratified by demographics and survival") + theme(legend.position = 'bottom')。

以下是包装文件中的示例

ggplot(titanic_wide,
       aes(y = Freq,
           axis1 = Survived, axis2 = Sex, axis3 = Class)) +
  geom_alluvium(aes(fill = Class),
                width = 0, knot.pos = 0, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  scale_x_continuous(expand = c(0, 0), 
                     breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
  scale_y_discrete(expand = c(0, 0)) +
  coord_flip() +
  ggtitle("Titanic survival by class and sex")

def test(x, y):
    if x == 0:
        sum = 0
        for i in y:
            sum += i + sum
        return sum
    elif x % 2 == 0:
        sum = 0
        for i in y:
            sum += i * 2
        return sum
    else:
        sum = 0
        for i in y:
            sum += i - 1
        return sum

^{由reprex package（v0.2.1.9000）于2018-11-13创建}

Answer 10

刚开源的包使用冲积图来可视化工作流阶段。由于在使用冲积形式时保留了历史记录，因此边缘没有任何交叉。

https://github.com/claytontstanley/shiny.alluvial

R的Sankey Diagrams？

10 个答案: