Pandas Dataframe NameError:我可以打印数据框,但是当我尝试聚合列时,我得到 name '' is not defined 错误

时间:2021-04-26 12:39:03

标签: python pandas dataframe aggregate

有人建议我为什么不能对最后一个数据帧进行求和吗?

如果有更短的方法来完成拆分标签和汇总频率,也欢迎提出建议。

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)pandas_profiling

print("\nCalculate aggregates for tags :\n")
TagsDFGroupBy =  df.groupby(['Tags','Lab Location' ]).agg({'ADO ID': ['count']}).rename(columns={'ADO ID':'WorkItemCnt'}).reset_index()
print(TagsDFGroupBy)

产生输出

                                                          Tags |  Labs | WorkItemCNT  |                                                --------------------------------------------------------------| ----- | ---        |
0|                                                        A2040|RXY Lab|1             |
1|                                      AWAITING COMMODITY QUAL|RXY Lab|1             |
2|                                                          DNR|RXY Lab|18            |  
3|                                         DNR; MISSING SKU DOC|RXY Lab|17            |
4|  MISSING QUAL INTAKE REQUEST; MISSING SKU DOC; NEED HARDWARE|QXR Lab|1             | 
5|                                              MISSING SKU DOC|RXY Lab|2             | 
6|                               MISSING SKU DOC; NEED HARDWARE|RXY Lab|1             |
7|                                     MISSING SKU DOC; NEED RA|RXY Lab|1             |
8|                                                NEED HARDWARE|RXY Lab|7             |
9|                                                NEED HARDWARE|VYZ Lab|4             |

然后我运行代码来拆分标签并对频率求和


print("\nSplit tags by semicolumn delimiter" )  
TagsDFGroupBy[['Tag1','Tag2','Tag3']] = TagsDFGroupBy.Tags.str.split(";",expand=True)

print("\nReplace none with blanks")  
mask = TagsDFGroupBy.applymap(lambda x: x is None)
cols = TagsDFGroupBy.columns[(mask).any()]
for col in TagsDFGroupBy[cols]:
    TagsDFGroupBy.loc[mask[col], col] = ''


print("\n3 different dataframes")  
TagsDFGroupBy1 = TagsDFGroupBy[['Lab Location','Tag1','WorkItemCnt']].rename(columns={'Tag1':'TagSplit'})
TagsDFGroupBy2 = TagsDFGroupBy[['Lab Location','Tag2','WorkItemCnt']].rename(columns={'Tag2':'TagSplit'})
TagsDFGroupBy3 = TagsDFGroupBy[['Lab Location','Tag3','WorkItemCnt']].rename(columns={'Tag3':'TagSplit'})


print("\nCombine 3 different dataframes into 1")  
TagsConcat = pd.concat([TagsDFGroupBy1, TagsDFGroupBy2, TagsDFGroupBy3], ignore_index=True)

# Get names of indexes for which TagSplit has a blank value
indexNames = TagsConcat[TagsConcat['TagSplit'] == '' ].index
# Delete these row indexes from dataFrame
TagsConcat.drop(indexNames , inplace=True)
TagsConcat.reset_index()
print('TagsConcat')
print(TagsConcat)

产生这个输出

       Lab Location                TagSplit         WorkItemCnt
                                                        count
 --------------|-------------------------------------- | ----------|
0       RXY LAB|                       A2040           |1
1       RXY LAB|     AWAITING COMMODITY QUAL           |1
2       RXY LAB|                         DNR           |18
3       RXY LAB|                         DNR           |17
4       QXR LAB|  MISSING QUAL INTAKE REQUEST          |1
5       RXY LAB|             MISSING SKU DOC           |2
6       RXY LAB|             MISSING SKU DOC           |1
7       RXY LAB|             MISSING SKU DOC           |1
8       RXY LAB|               NEED HARDWARE           |7
9       VYZ LAB|                NEED HARDWARE          |4
13      RXY LAB|             MISSING SKU DOC           |17
14      QXR LAB|              MISSING SKU DOC          |1
16      RXY LAB|               NEED HARDWARE           |1
17      RXY LAB|                     NEED RA           |1
24      QXR LAB|                NEED HARDWARE          |1

最后,我尝试使用其中一个

TagsFinal.groupby(['Lab Location', 'TagSplit'])['WorkItemCnt'].sum()

TagsFinal     =  TagsConcat.groupby(['Lab Location', 'TagSplit']).agg({'WorkItemCnt': ['sum']})

我收到此错误:

KeyError: 'WorkItemCnt'

1 个答案:

答案 0 :(得分:1)

我认为您的代码可以简化 - 首先将列 TagsDataFrame.explode 分开,然后按 GroupBy.size 进行聚合计数:

TagsFinal =  (df.assign(TagSplit = df['Tags'].str.split('; '))
                .explode('TagSplit')
                .groupby(['Labs', 'TagSplit'])
                .size()
                .reset_index(name='WorkItemCnt'))
                    

print (TagsFinal)
      Labs                     TagSplit  WorkItemCnt
0  QXR Lab  MISSING QUAL INTAKE REQUEST            1
1  QXR Lab              MISSING SKU DOC            1
2  QXR Lab                NEED HARDWARE            1
3  RXY Lab                        A2040            1
4  RXY Lab      AWAITING COMMODITY QUAL            1
5  RXY Lab                          DNR            2
6  RXY Lab              MISSING SKU DOC            4
7  RXY Lab                NEED HARDWARE            2
8  RXY Lab                      NEED RA            1
9  VYZ Lab                NEED HARDWARE            1