我已按照下面提到的链接在Python中创建Sunburst图表: How to make a sunburst plot in R or Python?
附件是笔记本供参考。
但是,对于创建图表的功能(按级别嵌套的列表),需要特定格式的数据。例如:
data = [
('/', 100, [
('home', 70, [
('Images', 40, []),
('Videos', 20, []),
('Documents', 5, []),
]),
('usr', 15, [
('src', 6, [
('linux-headers', 4, []),
('virtualbox', 1, []),
]),
('lib', 4, []),
('share', 2, []),
('bin', 1, []),
('local', 1, []),
('include', 1, []),
]),
]),
]
sunburst(data)
对于同一个例子,如果有人在一个excel文件中给我一个决策树输出,其中节点层次结构为级别,有没有办法将这个excel输出(如下所示)转换为上面的列表,这样我就可以使用给定的函数创建Sunburst
Excel输出:
Level0,Level1,Level2,Level3,Volume
/,,,,15
/,home,Images,,40
/,home,Videos,,20
/,home,Documents,,5
/,home,,,5
/,usr,src,linux-headers,4
/,usr,src,virtualbox,1
/,usr,src,,1
/,usr,lib,,4
/,usr,share,,2
/,usr,bin,,1
/,usr,local,,1
/,usr,include,,1
答案 0 :(得分:1)
您可以使用pandas DataFrame和递归来做到这一点:
import pandas as pd
def df_to_nested(dataframe, _groupby, level, col):
"""
- dataframe: source data
- _groupby: groupby columns
- level: start from this level (0 by default)
- col: value to aggregate
"""
if len(dataframe) == 1:
return [] # Reached max depth
else:
result = []
df = dataframe.groupby(_groupby[level])
level += 1 # Level0 -> Level1 (increase level)
for key, val in df: # Iterate through groups
result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
level -= 1 # Level1 -> Level0 (decrease level)
return result
df = pd.read_csv('test.csv') # Read your file
_groupby = ['Level0', 'Level1', 'Level2', 'Level3'] # Group by cols
result = df_to_nested(df, _groupby, 0, 'Volume')
print(result)
示例输出:
[
('/', 100, [
('home', 70, [
('Documents', 5, []),
('Images', 40, []),
('Videos', 20, [])
]),
('usr', 15, [
('bin', 1, []),
('include', 1, []),
('lib', 4, []),
('local', 1, []),
('share', 2, []),
('src', 6, [
('linux-headers', 4, []),
('virtualbox', 1, [])
])
])
])
]
答案 1 :(得分:1)
我想使用这个问题的答案在 Javascript 中创建 Plotly Sunburst。但是,输出与 these examples 中的数据格式不同。我想要一个更像 answer 这样的输出,所以我稍微改变了代码,现在我可以将它用于前端的 Javascript。
我会把它留在这里,以防它对某人有用。
def df_to_nested(dataframe, _groupby, level=0, col='count'):
result = []
if level == (len(_groupby) - 1):
df = dataframe.groupby(_groupby[level])
parent_cols = _groupby[:level]
for key, val in df: # Iterate through groups
row = val.head(1)
parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
'ids':parents+"-"+str(key)})
else:
df = dataframe.groupby(_groupby[level])
parent_cols = _groupby[:level]
level += 1 # Level0 -> Level1 (increase level)
for key, val in df: # Iterate through groups
if level==1:
parents = ""
ids=key
else:
row = val.head(1)
parents = "-".join(str(list(row[p])[0]) for p in parent_cols)
ids = parents+"-"+str(key)
result.append({'labels':key,'values':int(val[col].sum()),'parents':parents,
'ids':ids})
result.extend(df_to_nested(val, _groupby, level, col))
level -= 1 # Level1 -> Level0 (decrease level)
return result
def get_sunburst_format(df,path):
# path is the list of columns in dataframe that you want a sunburst from
tmp = df.copy()
tmp['count'] = 1
sunburst_data = df_to_nested(tmp,path)
sunburst_data = pd.DataFrame(sunburst_data)
return {column:list(sunburst_data[column]) for column in sunburst_data.columns}
答案 2 :(得分:0)
Nam Nguyen的回答非常好,但是在特定级别只有一个记录的情况下,它有一个小错误,然后该语句len(dataframe) == 1
变为True
并且该特定级别的一个值是不包括在结果中。我已经更新了他的答案,也可以算作这种情况:
def df_to_nested(dataframe, _groupby, level, col):
"""
- dataframe: source data
- _groupby: groupby columns
- level: start from this level (0 by default)
- col: value to aggregate
"""
result = []
if len(dataframe) == 1:
try:
df = dataframe.groupby(_groupby[level])
for key, val in df: # Iterate through groups
result.append(tuple([key, val[col].sum(), []]))
except Exception: # Reached max depth
pass
else:
df = dataframe.groupby(_groupby[level])
level += 1 # Level0 -> Level1 (increase level)
for key, val in df: # Iterate through groups
result.append(tuple([key, val[col].sum(), df_to_nested(val, _groupby, level, col)]))
level -= 1 # Level1 -> Level0 (decrease level)
return result