在单行中汇总熊猫数据框

时间:2020-10-07 19:45:46

标签: python pandas dataframe row summarize

我希望在将下面详述的数据框汇总为一行摘要时有所帮助,如所需的输出进一步显示在页面下方。预先非常感谢。

employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                         'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                         'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                         'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                         'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                         'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                        }
            
df = pd.DataFrame(employees)
    

所需的输出:

Name  Dept  Team     Start      End      Weeks Total Calls  Ave. Call time  Sold  Rejected  more info
Mark   21    2    2020-02-19 2020-07-06  19.71      13          0.01207       7       4        2

我要应用的逻辑是(尽管我猜我在下面编写的语法中有错误,但我希望您仍然能够理解计算):

  • 开始= df ['Log']中的最小日期
  • End = df ['Log']中的最大日期
  • 周=(df ['log']中的最大日期-df ['Log']中的最小日期)/ 7
  • 总通话次数= df ['Log']。count
  • 晚上。通话时间=(df ['通话时间'] .sum)/(df ['Log']。count)
  • 已售=(df ['ITT'] =='YES')。count
  • 已拒绝=(df ['ITT'] =='NO')。count
  • 更多信息=(df ['ITT'] =='关注').count

2 个答案:

答案 0 :(得分:3)

pd.NamedAgggroupby一起使用:

df['Log'] = pd.to_datetime(df['Log'])
df['Call Duration'] = df['Call Duration'].astype(float)

df.groupby(['Name of Employee', 'Team', 'Department'])\
  .agg(Start = ('Log','min'),
       End = ('Log', 'max'),
        Weeks = ('Log', lambda x: np.ptp(x) / np.timedelta64(7, 'D')),
        Total_Calls = ('Log', 'count'),
        Avg_Call_Time = ('Call Duration', 'mean'),
        Sold = ('ITT', lambda x: (x == 'YES').sum()),
        Rejected = ('ITT', lambda x: (x == 'NO').sum()),
        More_info = ('ITT', lambda x: (x=='Follow up').sum()))

输出:

                                               Start                 End      Weeks  Total_Calls  Avg_Call_Time  Sold  Rejected  More_info
Name of Employee Team Department                                                                                                          
Mark             2    21         2020-02-19 09:01:17 2020-07-06 11:00:31  19.726114           13       0.012068     7         4          2

答案 1 :(得分:0)

U出现语法错误,您忘记了在每个键的末尾添加逗号。 现在您可以在此数据框上工作了。

<a href="/images/myw3schoolsimage.jpg" download="w3logo">

输出:-

import pandas as pd
    employees = {'Name=': ['Mark','Mark','Mark','Mark','Mark','Mark', 'Mark','Mark','Mark','Mark','Mark','Mark','Mark'],
                             'Department': ['21','21','21','21','21','21', '21','21','21','21','21','21','21'],
                             'Team': ['2','2','2','2','2','2','2','2','2','2','2','2','2'],
                             'Log': ['2020-02-19 09:01:17', '2020-02-19 09:54:02', '2020-04-10 11:00:31', '2020-04-11 12:39:08', '2020-04-18 09:45:22', '2020-05-05 09:01:17', '2020-05-23 09:54:02', '2020-07-03 11:00:31', '2020-07-03 12:39:08', '2020-07-04 09:45:22', '2020-07-05 09:01:17', '2020-07-06 09:54:02', '2020-07-06 11:00:31'],
                             'Call Duration' : ['0.01178', '0.01736','0.01923','0.00911','0.01007','0.01206','0.01256','0.01006','0.01162','0.00733','0.01250','0.01013','0.01308'],
                             'ITT': ['NO','YES', 'NO', 'Follow up', 'YES','YES', 'NO', 'Follow up','YES','YES', 'NO','YES','YES']
                            }
                
    df = pd.DataFrame(employees)
    print(df)