熊猫样式的数据框-按类别列的色条

时间:2020-02-23 14:57:39

标签: pandas dataframe styles

我有以下数据框:

# define categorical column.
grps = pd.DataFrame(['a', 'a', 'a', 'b', 'b', 'b']) 

# generate dataframe.
df = pd.DataFrame(np.random.randn(18).reshape(6, 3))

# concatenate categorical column and dataframe.
df = pd.concat([grps, df], axis = 1)

# Assign column headers.
df.columns = ['group', 1, 2, 3]

通常,我的数据框可能包含类别列的级别更改数量,即“ a”,“ b”,“ c”,“ d” ...等。

然后我可以使用.bar()方法生成样式化的熊猫数据框,然后将其写入html文件:

# style the dataframe.
style_df = (df.style.bar(align = 'zero', color = '#FFA07A'))

# write styled dataframe to html.
df_html = style_df.hide_index().render()
with open("style_df.html","w") as fp:
    fp.write(df_html)

如何通过组类别列为每个数字列的条形着色?

我尝试使用pd.IndexSlice通过“组”创建主数据帧的子集,然后将它们传递给Pandas style.bar color based on condition?中的.bar()方法。但是,出现以下错误:IndexingError: Too many indexers。即使这样做确实可行,也不是理想选择,因为我需要向样式器手动添加连续的.bar()方法。理想情况下,我希望代码对任何给定的数据帧对不同的组级别做出反应。

我认为使用内置的Styler.apply方法进行条件格式化可能是最好的选择,但是根据此处的示例https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html,并没有任何作用。它们全部基于格式化单元格的背景色或值本身。

任何指针将不胜感激。

1 个答案:

答案 0 :(得分:0)

我从这篇文章中找到了一种修改代码的方法:How to set a cell not column or row in a dataframe with color?

更有意义的是按组类别为单元格的背景着色,而不是按组类别为条形着色。我希望这可以帮助较大的表作为视觉队列。

我必须定义一个执行此过程的函数,然后可以将其应用于pandas styler方法中的表。

import pandas as pd
import numpy as np

# define categorical column.
grps = pd.DataFrame(['a', 'a', 'a', 'b', 'b', 'b']) 

# generate dataframe.
df = pd.DataFrame(np.random.randn(18).reshape(6, 3))

# concatenate categorical column and dataframe.
df = pd.concat([grps, df], axis = 1)

# Assign column headers.
df.columns = ['group', 1, 2, 3]

通过类变量突出显示行的功能。

def highlight_rows(x):
    """ Function to apply alternating colour scheme to table cells by rows
    according to groups. 

    Parameters:
    x: dataframe to be styled.

    Returns:
    Styled dataframe

    """
    # ----------------------------------------------------------------------- #
    ### Set initial condition.

    # Generate copy of input dataframe. This will avoid chained indexing issues.
    df_cpy = x.copy()

    # ----------------------------------------------------------------------- #
    ### Define row index ranges per experimental group.

    # Reset numerical index in dataframe copy. Generates new column at
    # position 1 called 'index' and consisting of index positions.
    df_cpy = df_cpy.reset_index()

    # Generate dictionary of key:value pairs corresponding to 
    # grouped experimental class:index range as numerical list, respectively.
    grp_indexers_dict = dict(tuple((df_cpy.groupby('group')['index'])))

    # Generate list of series from dictionary values.
    indexers_series_lst = list(grp_indexers_dict.values())

    # Drop first column - 'index'. This is necessary to avoid 'ValueError' 
    # issue at a later stage. This is due to the extra column causing dataframe 
    # mismatching when this function is called from 'style_df()' function.
    df_cpy = df_cpy.drop('index', axis = 1)

    # ----------------------------------------------------------------------- #
    ### Initiate 'try' block.

    try:
    # Set default color as no colour.
       df_cpy.loc[:,:] = '' 

       # Set row colour by referencing elements of a list of series.
       # Each series corresponds to the numerical row index positions
       # for each group class. They therefore represent each class. 
       # They are generated dynamically from the input dataframe group column
       # in the 'style_df()' function, from which this function is called.
       # Numerical series can be used to slice a dataframe and specifically 
       # pass colour schemes to row subsets.
       # Note: 4 experimental groups defined below in order to account
       # for higher number of group levels. The idea is that these should 
       # always be in excess of total groups.

       # Group - 1.
       df_cpy.iloc[indexers_series_lst[0], ] = 'background-color: #A7CDDD'
       # Group - 2.
       df_cpy.iloc[indexers_series_lst[1], ] = 'background-color: #E3ECF8'
       # Group - 3.
       df_cpy.iloc[indexers_series_lst[2], ] = 'background-color: #A7CDDD'
       # Group - 4.
       df_cpy.iloc[indexers_series_lst[3], ] = 'background-color: #E3ECF8'

       # Return styled dataframe if total experimental classes equal
       # to total defined groups above.
       return(df_cpy)

    # ----------------------------------------------------------------------- #
    ### Initiate 'except' block.

    # Catches index error generated when there are fewer experimental
    # groups than defined in preceding 'try' block. 
    except IndexError:

       # Return styled dataframe.
       return(df_cpy)

将该函数传递给样式器,并生成样式化的html表。

# style the dataframe.
style_df = (df.style
            .bar(align = 'zero', color = '#FFA07A')
            # Call 'highlight_rows()' function to colour rows by group class.
            .apply(highlight_rows, axis=None))

# write styled dataframe to html.
df_html = style_df.hide_index().render()
with open("style_df.html","w") as fp:
    fp.write(df_html)enter code here

尽管这对于我处理的数据类型非常有效(非常难以超越10个组,因此在我的实际代码中函数中最多有10个索引器),但它不如函数动态响应那样优雅到组数。

如果有人想办法做到这一点,我仍然会很感兴趣,但我只是无法解决。我希望这可以帮助某人与他们的造型师!