我有以下数据框:
# define categorical column.
grps = pd.DataFrame(['a', 'a', 'a', 'b', 'b', 'b'])
# generate dataframe.
df = pd.DataFrame(np.random.randn(18).reshape(6, 3))
# concatenate categorical column and dataframe.
df = pd.concat([grps, df], axis = 1)
# Assign column headers.
df.columns = ['group', 1, 2, 3]
通常,我的数据框可能包含类别列的级别更改数量,即“ a”,“ b”,“ c”,“ d” ...等。
然后我可以使用.bar()
方法生成样式化的熊猫数据框,然后将其写入html文件:
# style the dataframe.
style_df = (df.style.bar(align = 'zero', color = '#FFA07A'))
# write styled dataframe to html.
df_html = style_df.hide_index().render()
with open("style_df.html","w") as fp:
fp.write(df_html)
如何通过组类别列为每个数字列的条形着色?
我尝试使用pd.IndexSlice
通过“组”创建主数据帧的子集,然后将它们传递给Pandas style.bar color based on condition?中的.bar()
方法。但是,出现以下错误:IndexingError: Too many indexers
。即使这样做确实可行,也不是理想选择,因为我需要向样式器手动添加连续的.bar()
方法。理想情况下,我希望代码对任何给定的数据帧对不同的组级别做出反应。
我认为使用内置的Styler.apply
方法进行条件格式化可能是最好的选择,但是根据此处的示例https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html,并没有任何作用。它们全部基于格式化单元格的背景色或值本身。
任何指针将不胜感激。
答案 0 :(得分:0)
我从这篇文章中找到了一种修改代码的方法:How to set a cell not column or row in a dataframe with color?
更有意义的是按组类别为单元格的背景着色,而不是按组类别为条形着色。我希望这可以帮助较大的表作为视觉队列。
我必须定义一个执行此过程的函数,然后可以将其应用于pandas styler方法中的表。
import pandas as pd
import numpy as np
# define categorical column.
grps = pd.DataFrame(['a', 'a', 'a', 'b', 'b', 'b'])
# generate dataframe.
df = pd.DataFrame(np.random.randn(18).reshape(6, 3))
# concatenate categorical column and dataframe.
df = pd.concat([grps, df], axis = 1)
# Assign column headers.
df.columns = ['group', 1, 2, 3]
通过类变量突出显示行的功能。
def highlight_rows(x):
""" Function to apply alternating colour scheme to table cells by rows
according to groups.
Parameters:
x: dataframe to be styled.
Returns:
Styled dataframe
"""
# ----------------------------------------------------------------------- #
### Set initial condition.
# Generate copy of input dataframe. This will avoid chained indexing issues.
df_cpy = x.copy()
# ----------------------------------------------------------------------- #
### Define row index ranges per experimental group.
# Reset numerical index in dataframe copy. Generates new column at
# position 1 called 'index' and consisting of index positions.
df_cpy = df_cpy.reset_index()
# Generate dictionary of key:value pairs corresponding to
# grouped experimental class:index range as numerical list, respectively.
grp_indexers_dict = dict(tuple((df_cpy.groupby('group')['index'])))
# Generate list of series from dictionary values.
indexers_series_lst = list(grp_indexers_dict.values())
# Drop first column - 'index'. This is necessary to avoid 'ValueError'
# issue at a later stage. This is due to the extra column causing dataframe
# mismatching when this function is called from 'style_df()' function.
df_cpy = df_cpy.drop('index', axis = 1)
# ----------------------------------------------------------------------- #
### Initiate 'try' block.
try:
# Set default color as no colour.
df_cpy.loc[:,:] = ''
# Set row colour by referencing elements of a list of series.
# Each series corresponds to the numerical row index positions
# for each group class. They therefore represent each class.
# They are generated dynamically from the input dataframe group column
# in the 'style_df()' function, from which this function is called.
# Numerical series can be used to slice a dataframe and specifically
# pass colour schemes to row subsets.
# Note: 4 experimental groups defined below in order to account
# for higher number of group levels. The idea is that these should
# always be in excess of total groups.
# Group - 1.
df_cpy.iloc[indexers_series_lst[0], ] = 'background-color: #A7CDDD'
# Group - 2.
df_cpy.iloc[indexers_series_lst[1], ] = 'background-color: #E3ECF8'
# Group - 3.
df_cpy.iloc[indexers_series_lst[2], ] = 'background-color: #A7CDDD'
# Group - 4.
df_cpy.iloc[indexers_series_lst[3], ] = 'background-color: #E3ECF8'
# Return styled dataframe if total experimental classes equal
# to total defined groups above.
return(df_cpy)
# ----------------------------------------------------------------------- #
### Initiate 'except' block.
# Catches index error generated when there are fewer experimental
# groups than defined in preceding 'try' block.
except IndexError:
# Return styled dataframe.
return(df_cpy)
将该函数传递给样式器,并生成样式化的html表。
# style the dataframe.
style_df = (df.style
.bar(align = 'zero', color = '#FFA07A')
# Call 'highlight_rows()' function to colour rows by group class.
.apply(highlight_rows, axis=None))
# write styled dataframe to html.
df_html = style_df.hide_index().render()
with open("style_df.html","w") as fp:
fp.write(df_html)enter code here
尽管这对于我处理的数据类型非常有效(非常难以超越10个组,因此在我的实际代码中函数中最多有10个索引器),但它不如函数动态响应那样优雅到组数。
如果有人想办法做到这一点,我仍然会很感兴趣,但我只是无法解决。我希望这可以帮助某人与他们的造型师!