这里是Python的新手(背景主要是在SAS中)。
我试图用不是索引变量的列求和(在下面的示例中,索引变量是'department',而我试图用'employee_fixed'求和)。我无法使其成为索引变量,因为索引变量被用作for循环的一部分。下面的代码应该很清楚。
#Creating dataset of departments you want to keep in your dataset
#Setting df to only include departments specified
cc = ['Furniture','Food','Clothing']
for index in range(len(cc)):
df3_cc = df[df['department'].isin([cc[index]])]
#set the department as the index variable so you can aggregate
df3_cc = df3_cc.set_index('department')
df3_cc
#Creating dataset of people who are NOT approved department
#Setting df to only include the condition specified in "notapprov"
notapprov = ['NO']
df3_cc = df3_cc[df3_cc['appr_list_chc'].isin(notapprov)]
df3_cc
#drop unnecessary columns from dataframe
df3_cc = df3_cc.drop(['fisc_yr_per'], axis=1)
# sum up the hours based on the indexed departments
# for those NOT approved to work that department and charging anyway
# >40hrs in the latest period
df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed')
#output to CSV
df3_cc.to_csv(r"C:\Users\etc\table3_"+cc[index]+".csv")
对于“ cc”中的每个项目,最终结果应为单独的CSV,其中应包括未经授权在该部门工作的每个部门(在'employee_fixed'中)工作的每个员工的总工作小时数在当前时段内工作时间大于等于40小时的人。
样本输入: 部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50 食物杰基是100 食品杰里米NO 75 食物吉姆10号 服装Jonas NO 200 服装杰里是10
输出: table3_furniture.csv 部门employee_fixed appr_list_chc hrs_per 家具John NO 45 家具雅各布NO 50
table3_food.csv 部门employee_fixed appr_list_chc hrs_per 食物杰里米NO 75
table3_food.csv 部门employee_fixed appr_list_chc hrs_per 服装Jonas NO 200
谢谢!
编辑:找到了答案! df3_cc = df3_cc [df3_cc ['hrs_per']> = 40] .sum(level ='employee_fixed') 成为 df3_cc = df3_cc [df3_cc ['hrs_per']> = 40]
答案 0 :(得分:0)
原来,只需要更改一行即可:
df3_cc = df3_cc[df3_cc['hrs_per'] >= 40].sum(level='employee_fixed')
收件人:
df3_cc = df3_cc[df3_cc['hrs_per'] >= 40]