Python Sunburst Chart - 将数据框转换为嵌套列表格式

时间:2018-05-14 20:48:44

标签: python list dataframe decision-tree sunburst-diagram

我已按照下面提到的链接在Python中创建Sunburst图表: How to make a sunburst plot in R or Python?

附件是笔记本供参考。

但是,对于创建图表的功能(按级别嵌套的列表),需要特定格式的数据。例如:

data = [
    ('/', 100, [
        ('home', 70, [
            ('Images', 40, []),
            ('Videos', 20, []),
            ('Documents', 5, []),
        ]),
        ('usr', 15, [
            ('src', 6, [
                ('linux-headers', 4, []),
                ('virtualbox', 1, []),

            ]),
            ('lib', 4, []),
            ('share', 2, []),
            ('bin', 1, []),
            ('local', 1, []),
            ('include', 1, []),
        ]),
    ]),
]
sunburst(data)

对于同一个例子,如果有人在一个excel文件中给我一个决策树输出,其中节点层次结构为级别,有没有办法将这个excel输出(如下所示)转换为上面的列表,这样我就可以使用给定的函数创建Sunburst

Excel输出:

Level0,Level1,Level2,Level3,Volume
/,,,,15
/,home,Images,,40
/,home,Videos,,20
/,home,Documents,,5
/,home,,,5
/,usr,src,linux-headers,4
/,usr,src,virtualbox,1
/,usr,src,,1
/,usr,lib,,4
/,usr,share,,2
/,usr,bin,,1
/,usr,local,,1
/,usr,include,,1

3 个答案:

答案 0 :(得分:1)

您可以使用pandas DataFrame和递归来做到这一点:

import pandas as pd

def df_to_nested(dataframe, _groupby, level, col):
    """
    - dataframe: source data
    - _groupby: groupby columns
    - level: start from this level (0 by default)
    - col: value to aggregate
    """
    if len(dataframe) == 1:
        return [] # Reached max depth
    else:
        result = []
        df = dataframe.groupby(_groupby[level])
        level += 1 # Level0 -> Level1 (increase level)
        for key, val in df: # Iterate through groups
            result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
        level -= 1 # Level1 -> Level0 (decrease level)
        return result

df = pd.read_csv('test.csv') # Read your file

_groupby = ['Level0', 'Level1', 'Level2', 'Level3'] # Group by cols

result = df_to_nested(df, _groupby, 0, 'Volume')

print(result)

示例输出:

[
    ('/', 100, [
        ('home', 70, [
            ('Documents', 5, []),
            ('Images', 40, []),
            ('Videos', 20, [])
        ]),
        ('usr', 15, [
            ('bin', 1, []),
            ('include', 1, []),
            ('lib', 4, []),
            ('local', 1, []),
            ('share', 2, []),
            ('src', 6, [
                ('linux-headers', 4, []),
                ('virtualbox', 1, [])
            ])
        ])
    ])
]

答案 1 :(得分:1)

我想使用这个问题的答案在 Javascript 中创建 Plotly Sunburst。但是,输出与 these examples 中的数据格式不同。我想要一个更像 answer 这样的输出,所以我稍微改变了代码,现在我可以将它用于前端的 Javascript。

我会把它留在这里,以防它对某人有用。

def df_to_nested(dataframe, _groupby, level=0, col='count'):

    result = []

    if level == (len(_groupby) - 1):
        df = dataframe.groupby(_groupby[level])
        parent_cols = _groupby[:level]
        for key, val in df:  # Iterate through groups
            row = val.head(1)
            parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
            
            result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
                           'ids':parents+"-"+str(key)})

    else:
        df = dataframe.groupby(_groupby[level])
        parent_cols = _groupby[:level]
        level += 1  # Level0 -> Level1 (increase level)
        for key, val in df:  # Iterate through groups
            if level==1:
                parents = ""
                ids=key
            else:
                row = val.head(1)
                parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
                ids = parents+"-"+str(key)

            result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
           'ids':ids})
            result.extend(df_to_nested(val, _groupby, level, col))

        level -= 1  # Level1 -> Level0 (decrease level)

    return result

def get_sunburst_format(df,path):
    # path is the list of columns in dataframe that you want a sunburst from
    tmp = df.copy()
    tmp['count'] = 1
    sunburst_data = df_to_nested(tmp,path)
    sunburst_data = pd.DataFrame(sunburst_data)
    return {column:list(sunburst_data[column]) for column in sunburst_data.columns}

答案 2 :(得分:0)

Nam Nguyen的回答非常好,但是在特定级别只有一个记录的情况下,它有一个小错误,然后该语句len(dataframe) == 1变为True并且该特定级别的一个值是不包括在结果中。我已经更新了他的答案,也可以算作这种情况:

def df_to_nested(dataframe, _groupby, level, col):
"""
- dataframe: source data
- _groupby: groupby columns
- level: start from this level (0 by default)
- col: value to aggregate
"""
result = []
if len(dataframe) == 1:        
    try:
        df = dataframe.groupby(_groupby[level])                    
        for key, val in df: # Iterate through groups                
            result.append(tuple([key, val[col].sum(), []]))
    except Exception: # Reached max depth
        pass
else:
    df = dataframe.groupby(_groupby[level])
    level += 1 # Level0 -> Level1 (increase level)
    for key, val in df: # Iterate through groups
        result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
    level -= 1 # Level1 -> Level0 (decrease level)

return result