Question

这里是Python的新手（背景主要是在SAS中）。

我试图用不是索引变量的列求和（在下面的示例中，索引变量是'department'，而我试图用'employee_fixed'求和）。我无法使其成为索引变量，因为索引变量被用作for循环的一部分。下面的代码应该很清楚。

#Creating dataset of departments you want to keep in your dataset
   #Setting df to only include departments specified
    cc = ['Furniture','Food','Clothing']
    for index in range(len(cc)): 
    df3_cc = df[df['department'].isin([cc[index]])]
    #set the department as the index variable so you can aggregate 
    df3_cc = df3_cc.set_index('department')
    df3_cc
    #Creating dataset of people who are NOT approved department
     #Setting df to only include the condition specified in "notapprov"
    notapprov = ['NO']
    df3_cc = df3_cc[df3_cc['appr_list_chc'].isin(notapprov)]
    df3_cc
    #drop unnecessary columns from dataframe
    df3_cc = df3_cc.drop(['fisc_yr_per'], axis=1)
    # sum up the hours based on the indexed departments
    # for those NOT approved to work that department and charging anyway
    # >40hrs in the latest period
    df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed') 
    #output to CSV
    df3_cc.to_csv(r"C:\Users\etc\table3_"+cc[index]+".csv")

对于“ cc”中的每个项目，最终结果应为单独的CSV，其中应包括未经授权在该部门工作的每个部门（在'employee_fixed'中）工作的每个员工的总工作小时数在当前时段内工作时间大于等于40小时的人。

样本输入：部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50 食物杰基是100 食品杰里米NO 75 食物吉姆10号服装Jonas NO 200 服装杰里是10

输出： table3_furniture.csv 部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50

table3_food.csv 部门employee_fixed appr_list_chc hrs_per 食物杰里米NO 75

table3_food.csv 部门employee_fixed appr_list_chc hrs_per 服装Jonas NO 200

谢谢！

编辑：找到了答案！ df3_cc = df3_cc [df3_cc ['hrs_per']> = 40] .sum（level ='employee_fixed'）成为 df3_cc = df3_cc [df3_cc ['hrs_per']> = 40]

Answer 1

原来，只需要更改一行即可：

df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed')

收件人：

df3_cc = df3_cc[df3_cc['hrs_per'] >= 40]

用不是索引变量的列求和（Python）

1 个答案: