这听起来像是一个非常广泛的问题,但是如果您让我描述一些细节,我可以向您保证,这是 非常具体 。以及令人沮丧,沮丧和愤怒。
以下图表描述了苏格兰选举,并基于plot.ly的代码:
情节1:
数据集1:
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)'],
[1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'],
[2,5,3,'#449E9E','39 – Yes','rgba(250, 188, 19, 1)'],
[2,6,17,'#D3D3D3','14 – Don’t know / would not vote','rgba(250, 188, 19, 0.5)'],
[2,7,2,'','','rgba(250, 188, 19, 0.5)'],
[3,5,3,'','','rgba(127, 194, 65, 1)'],
[3,6,9,'','','rgba(127, 194, 65, 0.5)'],
[3,7,2,'','','rgba(127, 194, 65, 0.5)'],
[4,5,5,'','','rgba(211, 211, 211, 0.5)'],
[4,6,9,'','','rgba(211, 211, 211, 0.5)'],
[4,7,8,'','','rgba(211, 211, 211, 0.5)']
]
地块的建造方式:
我从各种来源获得了有关sankey图表行为的一些重要细节,例如:
Sankey automatically orders the categories to minimize the amount of overlap
Links are assigned in the order they appear in dataset (row_wise)
For the nodes colors are assigned in the order plot is built.
挑战:
正如您将在下面的详细信息中看到的那样,节点,标签和颜色的显示顺序与源数据帧的构建顺序不同。其中的 Some 非常完美,因为您具有描述相同节点的各种元素,例如颜色,目标,值和链接颜色。一个节点'Remain+No – 28'
看起来像这样:
数据集的随附部分如下所示:
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
因此,源代码的这一部分描述了一个节点[0]
,该节点具有三个对应的目标[5, 6, 7]
和三个具有值[20, 3, 5]
的链接。 '#F27420'
是节点的橙色,颜色'rgba(253, 227, 212, 0.5)'
,'rgba(242, 116, 32, 1)'
和'rgba(253, 227, 212, 0.5)'
描述了从节点到某些目标的链接的颜色。到目前为止,以上示例中尚未使用的信息是:
数据样本2(部分)
[-,-,--'-------','---------------','-------------------'],
[-,-,-,'#4994CE','Leave+No – 16','-------------------'],
[-,-,-,'#FABC13','Remain+Yes – 21','-------------------'],
然后介绍该信息,作为图表的其余元素。
那么,这是什么问题?在下面的更多详细信息中,您将看到,只要数据集中的新数据行插入新链接,并且如果尚未使用其他信息(颜色,标签),则对其他元素(颜色,标签)进行其他更改,一切都会变得有意义。我将更详细地说明我使用的两个屏幕截图,其中左侧为绘图,右侧为代码。
以下数据样本按照上面描述的逻辑生成了下图:
数据示例3
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)']]
屏幕截图1-带有数据样本3的局部图
问题:
在数据集中添加行[1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)']
会在源[5]
与目标[7]
之间建立新的链接,但将颜色和标签应用于目标5 同时。我认为下一个要应用于图表的标签是'Remain+Yes – 21'
,因为它尚未使用。但是这里发生的是将标签'46 – No'
应用于目标5。 为什么?
截屏2-带有数据样本3的局部图 + [1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)']
:
您如何根据该数据框识别出什么是源,什么是目标?
我知道这个问题既奇怪又难以回答,但我希望有人提出建议。我也知道,数据框可能不是sankey图表的最佳来源。也许是json呢?
完整的代码和数据示例,可轻松复制并粘贴Jupyter Notebook:
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
# Original data
data = [['Source','Target','Value','Color','Node, Label','Link Color'],
[0,5,20,'#F27420','Remain+No – 28','rgba(253, 227, 212, 0.5)'],
[0,6,3,'#4994CE','Leave+No – 16','rgba(242, 116, 32, 1)'],
[0,7,5,'#FABC13','Remain+Yes – 21','rgba(253, 227, 212, 0.5)'],
[1,5,14,'#7FC241','Leave+Yes – 14','rgba(219, 233, 246, 0.5)'],
[1,6,1,'#D3D3D3','Didn’t vote in at least one referendum – 21','rgba(73, 148, 206, 1)'],
[1,7,1,'#8A5988','46 – No','rgba(219, 233, 246,0.5)'],
[2,5,3,'#449E9E','39 – Yes','rgba(250, 188, 19, 1)'],
[2,6,17,'#D3D3D3','14 – Don’t know / would not vote','rgba(250, 188, 19, 0.5)'],
[2,7,2,'','','rgba(250, 188, 19, 0.5)'],
[3,5,3,'','','rgba(127, 194, 65, 1)'],
[3,6,9,'','','rgba(127, 194, 65, 0.5)'],
[3,7,2,'','','rgba(127, 194, 65, 0.5)'],
[4,5,5,'','','rgba(211, 211, 211, 0.5)'],
[4,6,9,'','','rgba(211, 211, 211, 0.5)'],
[4,7,8,'','','rgba(211, 211, 211, 0.5)']
]
headers = data.pop(0)
df = pd.DataFrame(data, columns = headers)
scottish_df = df
data_trace = dict(
type='sankey',
domain = dict(
x = [0,1],
y = [0,1]
),
orientation = "h",
valueformat = ".0f",
node = dict(
pad = 10,
thickness = 30,
line = dict(
color = "black",
width = 0
),
label = scottish_df['Node, Label'].dropna(axis=0, how='any'),
color = scottish_df['Color']
),
link = dict(
source = scottish_df['Source'].dropna(axis=0, how='any'),
target = scottish_df['Target'].dropna(axis=0, how='any'),
value = scottish_df['Value'].dropna(axis=0, how='any'),
color = scottish_df['Link Color'].dropna(axis=0, how='any'),
)
)
layout = dict(
title = "Scottish Referendum Voters who now want Independence",
height = 772,
font = dict(
size = 10
),
)
fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)
答案 0 :(得分:4)
这个问题看起来确实很奇怪,但是只有在您分析如何创建plotly
中的sankey图之前,就可以了:
创建sankey图时,将其发送给它:
这些列表相互绑定。创建5个长度的节点列表时,任何边缘在其起点和终点都将知道0,1,2,3,4
。在程序中,您错误地创建了节点-您创建了链接列表,然后遍历它并创建节点。看你的图。它有两个内部带有undefined
的黑色节点。数据集的长度是多少...是的,5
。您的节点索引以4
结尾,并且没有真正定义任何目标节点。您将第六个列表添加到数据集中,然后-宾果游戏! -有nodes[5]
存在!只需尝试在数据集中添加另一行:
[1,7,1,'#FF0000','WAKA','rgba(219, 233, 246,0.5)']
您将看到另一个黑色条变成红色。您有五个节点(因为您有5个链接,并且通过迭代链接列表来创建节点),但是链接目标索引为5,6,7
。您可以通过两种方式修复它:
Target
更改为2,3,4
希望我能帮助您解决问题和了解地块创建(更重要的IMO)。
编辑:下面是创建单独的节点/链接的示例(请注意,node
中的data_trace
部分仅使用nodes_df
数据,{{1} } link
中的部分仅使用data_trace
数据,并且links_df
和nodes_df
的长度不相等):
links_df
编辑2:让我们更深入地研究:) sankey图中的节点和链接几乎完全独立。限制它们的唯一信息-链接中源目标中的索引。因此,我们可以创建许多节点,并且没有链接(只需用Edit1代码替换节点/链接即可):
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
nodes = [
['ID', 'Label', 'Color'],
[0,'Remain+No – 28','#F27420'],
[1,'Leave+No – 16','#4994CE'],
[2,'Remain+Yes – 21','#FABC13'],
[3,'Leave+Yes – 14','#7FC241'],
[4,'Didn’t vote in at least one referendum – 21','#D3D3D3'],
[5,'46 – No','#8A5988']
]
links = [
['Source','Target','Value','Link Color'],
[0,3,20,'rgba(253, 227, 212, 0.5)'],
[0,4,3,'rgba(242, 116, 32, 1)'],
[0,2,5,'rgba(253, 227, 212, 0.5)'],
[1,5,14,'rgba(219, 233, 246, 0.5)'],
[1,3,1,'rgba(73, 148, 206, 1)'],
[1,4,1,'rgba(219, 233, 246,0.5)'],
[1,2,10,'rgba(8, 233, 246,0.5)'],
[1,3,5,'rgba(219, 77, 246,0.5)'],
[1,5,12,'rgba(219, 4, 246,0.5)']
]
nodes_headers = nodes.pop(0)
nodes_df = pd.DataFrame(nodes, columns = nodes_headers)
links_headers = links.pop(0)
links_df = pd.DataFrame(links, columns = links_headers)
data_trace = dict(
type='sankey',
domain = dict(
x = [0,1],
y = [0,1]
),
orientation = "h",
valueformat = ".0f",
node = dict(
pad = 10,
thickness = 30,
line = dict(
color = "black",
width = 0
),
label = nodes_df['Label'].dropna(axis=0, how='any'),
color = nodes_df['Color']
),
link = dict(
source = links_df['Source'].dropna(axis=0, how='any'),
target = links_df['Target'].dropna(axis=0, how='any'),
value = links_df['Value'].dropna(axis=0, how='any'),
color = links_df['Link Color'].dropna(axis=0, how='any'),
)
)
layout = dict(
title = "Scottish Referendum Voters who now want Independence",
height = 772,
font = dict(
size = 10
),
)
fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)
这些节点将不会出现在图中。
我们只能创建没有节点的链接:
nodes = [
['ID', 'Label', 'Color'],
[0,'Remain+No – 28','#F27420'],
[1,'Leave+No – 16','#4994CE'],
[2,'Remain+Yes – 21','#FABC13'],
[3,'Leave+Yes – 14','#7FC241'],
[4,'Didn’t vote in at least one referendum – 21','#D3D3D3'],
[5,'46 – No','#8A5988'],
[6,'WAKA1','#8A5988'],
[7,'WAKA2','#8A5988'],
[8,'WAKA3','#8A5988'],
[9,'WAKA4','#8A5988'],
[10,'WAKA5','#8A5988'],
[11,'WAKA6','#8A5988'],
]
links = [
['Source','Target','Value','Link Color'],
[0,3,20,'rgba(253, 227, 212, 0.5)'],
[0,4,3,'rgba(242, 116, 32, 1)'],
[0,2,5,'rgba(253, 227, 212, 0.5)'],
[1,5,14,'rgba(219, 233, 246, 0.5)'],
[1,3,1,'rgba(73, 148, 206, 1)'],
[1,4,1,'rgba(219, 233, 246,0.5)'],
[1,2,10,'rgba(8, 233, 246,0.5)'],
[1,3,5,'rgba(219, 77, 246,0.5)'],
[1,5,12,'rgba(219, 4, 246,0.5)']
]
我们将只有从无处到无处的链接。
如果您想添加具有链接的新源(1),则应在nodes = [
['ID', 'Label', 'Color'],
]
links = [
['Source','Target','Value','Link Color'],
[0,3,20,'rgba(253, 227, 212, 0.5)'],
[0,4,3,'rgba(242, 116, 32, 1)'],
[0,2,5,'rgba(253, 227, 212, 0.5)'],
[1,5,14,'rgba(219, 233, 246, 0.5)'],
[1,3,1,'rgba(73, 148, 206, 1)'],
[1,4,1,'rgba(219, 233, 246,0.5)'],
[1,2,10,'rgba(8, 233, 246,0.5)'],
[1,3,5,'rgba(219, 77, 246,0.5)'],
[1,5,12,'rgba(219, 4, 246,0.5)']
]
中添加新列表,计算其索引(这就是为什么我有ID列)并在nodes
中添加一个新列表,其中links
等于节点索引。
如果您想为现有节点添加(2)新目标,只需在Source
中添加一个新列表,并写入其links
和Source
正确地:
Target
(这里我为4个新目标创建了4个新链接。源是所有索引都为 [1,100500,10,'rgba(219, 233, 246,0.5)'],
[1,100501,10,'rgba(8, 233, 246,0.5)'],
[1,100502,10,'rgba(219, 77, 246,0.5)'],
[1,100503,10,'rgba(219, 4, 246,0.5)']
的节点。)
(3 + 4):Sankey图没有不同的来源和目标。它们都是Sankey的节点。每个节点既可以是源,也可以是目标。看一下:
1
在这里您将获得3列的Sankey图。 0 节点是源, 1 是目标,而 2 节点是 1 的源 2 的目标。