用不是索引变量的列求和(Python)

时间:2019-07-17 17:14:24

标签: python pandas indexing sum

这里是Python的新手(背景主要是在SAS中)。

我试图用不是索引变量的列求和(在下面的示例中,索引变量是'department',而我试图用'employee_fixed'求和)。我无法使其成为索引变量,因为索引变量被用作for循环的一部分。下面的代码应该很清楚。

#Creating dataset of departments you want to keep in your dataset
   #Setting df to only include departments specified
    cc = ['Furniture','Food','Clothing']
    for index in range(len(cc)): 
    df3_cc = df[df['department'].isin([cc[index]])]
    #set the department as the index variable so you can aggregate 
    df3_cc = df3_cc.set_index('department')
    df3_cc
    #Creating dataset of people who are NOT approved department
     #Setting df to only include the condition specified in "notapprov"
    notapprov = ['NO']
    df3_cc = df3_cc[df3_cc['appr_list_chc'].isin(notapprov)]
    df3_cc
    #drop unnecessary columns from dataframe
    df3_cc = df3_cc.drop(['fisc_yr_per'], axis=1)
    # sum up the hours based on the indexed departments
    # for those NOT approved to work that department and charging anyway
    # >40hrs in the latest period
    df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed') 
    #output to CSV
    df3_cc.to_csv(r"C:\Users\etc\table3_"+cc[index]+".csv")

对于“ cc”中的每个项目,最终结果应为单独的CSV,其中应包括未经授权在该部门工作的每个部门(在'employee_fixed'中)工作的每个员工的总工作小时数在当前时段内工作时间大于等于40小时的人。

样本输入: 部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50 食物杰基是100 食品杰里米NO 75 食物吉姆10号 服装Jonas NO 200 服装杰里是10

输出: table3_furniture.csv 部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50

table3_food.csv 部门employee_fixed appr_list_chc hrs_per 食物杰里米NO 75

table3_food.csv 部门employee_fixed appr_list_chc hrs_per 服装Jonas NO 200

谢谢!

编辑:找到了答案!  df3_cc = df3_cc [df3_cc ['hrs_per']> = 40] .sum(level ='employee_fixed') 成为  df3_cc = df3_cc [df3_cc ['hrs_per']> = 40]

1 个答案:

答案 0 :(得分:0)

原来,只需要更改一行即可:

df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed')

收件人:

df3_cc = df3_cc[df3_cc['hrs_per'] >= 40]