Sankey条形图diagramm与熊猫或蟒蛇

时间:2017-09-18 14:55:30

标签: python pandas matplotlib bar-chart sankey-diagram

我想制作一个像这样的条形图,其中包含任何我可以与matplotlib连接的python模块:

Sankey stacked bar chart

下面是一个示例数据和我现在可以做的解释:

import pandas
from io import StringIO

text="""
Name                           1980              1982
A                    Administration            Budget
B                    Administration    Administration
C                    Administration    Administration
D                    Administration            Budget
E                    Administration            Budget
F                    Administration    Administration
G                    Administration    Administration
H                    Administration    Administration
"""

data=pandas.read_fwf(StringIO(text),header=1).set_index("Name")

count=pandas.DataFrame(index=["Administration","Budget"])
for col in data.columns:
    count[col]=data[col].value_counts()

count.T.plot(kind="bar",stacked=True)

当我绘制count时,我得到以下叠加条形图:

Stacked bar chart

我还可以通过以下方式获得1980年至1982年间从行政部门到预算部门的人数

pandas.crosstab(data["1980"],data["1982"])

给出:

1982            Administration  Budget
1980                                  
Administration               5       3

但是我不知道如何在条形图的每个部分之间绘制流。有谁知道怎么做?

1 个答案:

答案 0 :(得分:0)

您可以使用pandas的功能:交叉表和融化为Sankey准备数据:

from io import StringIO
import pandas as pd
import plotly
import chart_studio.plotly as py

text = """
Name                           1980              1982
A                    Administration            Budget
B                    Administration    Administration
C                    Administration    Administration
D                    Administration            Budget
E                    Administration            Budget
F                    Administration    Administration
G                    Administration    Administration
H                    Administration    Administration
"""
data = pd.read_fwf(StringIO(text),header=1)

# Make crosstab
data_cross = pd.crosstab(data['1980'], data['1982'])
print(data_cross)

# Make flat table
data_tidy = data_cross.rename_axis(None, axis=1).reset_index().copy()

# Make tidy table
formatted_data = pd.melt(data_tidy,
                         ['1980'],
                         var_name='1982',
                         value_name='Value')

import plotly.graph_objects as go

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = ["Administration", "Administration", "Budget"],
      color = ['blue', 'blue', 'green']
    ),
    link = dict(
        source = [0, 0], # indices correspond to labels...
        target = [1, 2],
        value = [5, 3],
        color = ['lightblue', 'lightgreen']
  ))])

fig.update_layout(title_text="Basic Sankey Diagram", font_size=10)
fig.show()

Snapshot of figure