我正试图从中获取一个df:
df = pd.DataFrame({'Start Time': ['27/02/2018 12:56', '27/02/2018 12:56', '27/02/2018 12:51', '28/02/2018 12:51', '28/02/2018 12:46', '28/02/2018 12:46', '28/02/2018 12:41', '28/02/2018 12:41', '01/03/2018 12:36', '01/03/2018 12:36', '01/03/2018 12:31', '01/03/2018 12:31', '02/03/2018 12:27', '02/03/2018 12:27', '02/03/2018 12:27', '02/03/2018 12:27'], 'Event_type': ['Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer', 'Transfer'], 'Status': ['S', 'S', 'S', 'S', 'F', 'S', 'F', 'S', 'F', 'S', 'S', 'F', 'S', 'S', 'F', 'F'], 'Job Number': [1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0, 1000000000000.0]}, columns=['Job Number','Start Time','Event_type','Status'])
print (df)
Job Number Start Time Event_type Status
0 1.000000e+12 27/02/2018 12:56 Transfer S
1 1.000000e+12 27/02/2018 12:56 Transfer S
2 1.000000e+12 27/02/2018 12:51 Transfer S
3 1.000000e+12 28/02/2018 12:51 Transfer S
4 1.000000e+12 28/02/2018 12:46 Transfer F
5 1.000000e+12 28/02/2018 12:46 Transfer S
6 1.000000e+12 28/02/2018 12:41 Transfer F
7 1.000000e+12 28/02/2018 12:41 Transfer S
8 1.000000e+12 01/03/2018 12:36 Transfer F
9 1.000000e+12 01/03/2018 12:36 Transfer S
10 1.000000e+12 01/03/2018 12:31 Transfer S
11 1.000000e+12 01/03/2018 12:31 Transfer F
12 1.000000e+12 02/03/2018 12:27 Transfer S
13 1.000000e+12 02/03/2018 12:27 Transfer S
14 1.000000e+12 02/03/2018 12:27 Transfer F
15 1.000000e+12 02/03/2018 12:27 Transfer F
为:
Status F S Grand Total
Start Time
2018-01-03 2 2 4
2018-02-03 2 2 4
2018-02-27 0 3 3
2018-02-28 2 3 5
Grand Total 6 10 16
我需要做的是计算在给定日期发生的带有“S”标记的目标文件名,状态只能是“S”或“F”。
我目前使用的代码是:
df = pd.read_csv('JobFileAuditLogs20180227_B.csv', encoding='utf-8')
df['Start Time'] = pd.to_datetime(df['Start Time']).dt.date
df.to_csv('JobFileAuditLogs20180227_C.csv', sep=',', encoding='utf-8')
df = pd.read_csv('JobFileAuditLogs20180227_C.csv', index_col='Start Time',
encoding='utf-8')
df[['Status', 'Destination File Name']]
我尝试使用
df['Status'].value_counts()
但这只会给出S和F的出现次数,而不是每天有多少次出现。
我不知道如何从这里开始,任何帮助都会很棒。
答案 0 :(得分:1)
我相信你需要crosstab
:
df = pd.crosstab(pd.to_datetime(df['Start Time']).dt.date,
df['Status'],
margins=True,
margins_name='Grand Total')
print (df)
Status F S Grand Total
Start Time
2018-01-03 2 2 4
2018-02-03 2 2 4
2018-02-27 0 3 3
2018-02-28 2 3 5
Grand Total 6 10 16