熊猫在分类索引中添加行

时间:2020-01-30 15:14:50

标签: pandas pandas-groupby

我有一种情况,我想按个人定义的星期索引对数据集进行分组,然后对它们进行平均,然后将平均值汇总到“总计”行中。我可以实现方案的上半部分,但是当我尝试添加/插入新的“总计”行来汇总这些行时,我会收到错误消息。

我尝试通过两种不同的方法创建此行:

方法1:

week_index_avg_unit.loc['Total'] = week_index_avg_unit.sum()

TypeError: cannot append a non-category item to a CategoricalIndex

方法2:

week_index_avg_unit.index.insert(['Total'], week_index_avg_unit.sum())

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

在这种情况下,我多次使用第一种方法,但这是我第一次将数据分为多个类别,并清楚地知道问题所在的CategoricalIndex类型。

以下是我的数据格式:

  date  organic  ppc   oa  other  content_partnership  total  \
0  2018-01-01      379  251  197     51                    0    878   
1  2018-01-02      880  527  405    217                    0   2029   
2  2018-01-03      859  589  403    323                    0   2174   
3  2018-01-04      835  533  409    335                    0   2112   
4  2018-01-05      760  449  355    272                    0   1836   

  year_month  day  weekday weekday_name week_index  
0    2018-01    1        0       Monday     Week 1  
1    2018-01    2        1      Tuesday     Week 1  
2    2018-01    3        2    Wednesday     Week 1  
3    2018-01    4        3     Thursday     Week 1  
4    2018-01    5        4       Friday     Week 1  

以下是代码:

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
historicals = pd.read_csv("2018-2019_plants.csv")

# Capture dates for additional date columns
date_col = pd.to_datetime(historicals['date'])

historicals['year_month'] = date_col.dt.strftime("%Y-%m")
historicals['day'] = date_col.dt.day
historicals['weekday'] = date_col.dt.dayofweek
historicals['weekday_name'] = date_col.dt.day_name()

# create week ranges segment (7 day range)
historicals['week_index'] = pd.cut(historicals['day'],[0,7,14,21,28,32], labels=['Week 1','Week 2','Week 3','Week 4','Week 5'])

# Week Index Average (Units)
week_index_avg_unit = historicals[df_monthly_average].groupby(['week_index']).mean().astype(int)

type(week_index_avg_unit.index)
pandas.core.indexes.category.CategoricalIndex

这是week_index_avg_unit表:

organic  ppc   oa  other  content_partnership  total  day  weekday
week_index                                                                    
Week 1          755  361  505    405                   22   2027    4        3
Week 2          787  360  473    337                   19   1959   11        3
Week 3          781  382  490    352                   18   2006   18        3
...

1 个答案:

答案 0 :(得分:0)

pd.CategoricalIndex是一种特殊的动物。它是一成不变的,因此,要完成此技巧,您可能需要使用诸如pd.CategoricalIndex.set_categories之类的东西来添加新类别。 查看熊猫文档:https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.CategoricalIndex.html