在Sankey图中垂直放置节点以避免与链接冲突

时间:2019-07-15 12:19:28

标签: python plotly sankey-diagram

我正在尝试使用Plotly制作Sankey-plot,它遵循将某些文档过滤到范围内或范围之外的功能,即1个源,2个目标,但是某些文档在第1步中进行了过滤在第2步等过程中。这将导致以下Sankey-plot:

Current output

现在我理想的情况是使它看起来像这样:

Ideal output

我已经尝试浏览https://plot.ly/python/reference/#sankey上的文档,但是找不到所需的内容,理想情况下,我想实现一项功能,以防止绘图重叠节点和链接。

这是我正在使用的生成图对象的代码:

def genSankeyPlotObject(df, cat_cols=[], value_cols='', visible = False):

    ### COLORPLATTE TO USE
    colorPalette = ['472d3c', '5e3643', '7a444a', 'a05b53', 'bf7958', 'eea160', 'f4cca1', 'b6d53c', '71aa34', '397b44',
                    '3c5956', '302c2e', '5a5353', '7d7071', 'a0938e', 'cfc6b8', 'dff6f5', '8aebf1', '28ccdf', '3978a8',
                    '394778', '39314b', '564064', '8e478c', 'cd6093', 'ffaeb6', 'f4b41b', 'f47e1b', 'e6482e', 'a93b3b',
                    '827094', '4f546b']

    ### CREATES LABELLIST FROM DEFINED COLUMNS
    labelList = []
    for catCol in cat_cols:
        labelListTemp = list(set(df[catCol].values))
        labelList = labelList + labelListTemp
    labelList = list(dict.fromkeys(labelList))

    ### DEFINES THE NUMBER OF COLORS IN THE COLORPALLET
    colorNum = len(df[cat_cols[0]].unique()) + len(df[cat_cols[1]].unique()) + len(df[cat_cols[2]].unique())
    TempcolorPallet = colorPalette * math.ceil(len(colorPalette)/colorNum)
    shuffle(TempcolorPallet)
    colorList = TempcolorPallet[0:colorNum]

    ### TRANSFORMS DF INTO SOURCE -> TARGET PAIRS
    for i in range(len(cat_cols)-1):
        if i==0:
            sourceTargetDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
            sourceTargetDf.columns = ['source','target','count']
        else:
            tempDf = df[[cat_cols[i],cat_cols[i+1],value_cols]]
            tempDf.columns = ['source','target','count']
            sourceTargetDf = pd.concat([sourceTargetDf,tempDf])
        sourceTargetDf = sourceTargetDf.groupby(['source','target']).agg({'count':'sum'}).reset_index()

    ### ADDING INDEX TO SOURCE -> TARGET PAIRS
    sourceTargetDf['sourceID'] = sourceTargetDf['source'].apply(lambda x: labelList.index(x))
    sourceTargetDf['targetID'] = sourceTargetDf['target'].apply(lambda x: labelList.index(x))

    ### CREATES THE SANKEY PLOT OBJECT
    data = go.Sankey(node = dict(pad = 15,
                                 thickness = 20,
                                 line = dict(color = "black",
                                             width = 0.5),
                                 label = labelList,
                                 color = colorList),
                     link = dict(source = sourceTargetDf['sourceID'],
                                 target = sourceTargetDf['targetID'],
                                 value = sourceTargetDf['count']),
                     valuesuffix = ' ' + value_cols,
                     visible = visible)

    return data

0 个答案:

没有答案