Pandas按日期过滤 如何使用日期过滤CSV
示例CSV
User Dates Hours shift
User1 01.01.2012 5 aaa
User1 02.01.2012 5 aaa
User1 03.01.2012 2 bbb
User1 04.01.2012 3 aaa
.....
User1 12.03.2012 1 aaa
User1 13.03.2012 8 ccc
.....
User2 04.02.2012 4 aaa
User2 05.02.2012 3 bbb
结束
我可以通过
用户进行过滤use = users.loc["User1"]
我也可以总结所有时间
print(use["Hours"].sum()
我可以算他的班次
counts = use.loc[ou['Shift'] == 'aaa', 'Hours'].value_counts()
但我不知道如何按日期和上述陈述进行过滤。 比如用户2计算3月份的所有班次,或者按用户1计算2月份完成的所有小时数
或多或少我管理了按日期和用户过滤表
use['Date'] = pd.to_datetime(use['Date'], infer_datetime_format=True, exact=True)
mask = (use['Datum'] > Start) & (use['Date'] <= End)
print(use.loc[mask])
但我无法弄清楚如何将它们结合起来。 期望的输出
Overview March 2016
User1 made 3 aaa shifts
User1 worked 12h in March 2016
更新: 我取得了一些进展
print(use[use['Date'] > '02.01.2012'],['hours'].sum()))
工作正常,但不是我想要的。用:
print(use[use['Date'] > '02.01.2012'] & (use[use['Date'] < '02.05.2012'],['hours'].sum()))
我得到了
AttributeError: 'list' object has no attribute 'sum'
答案 0 :(得分:1)
我认为你可以使用:
Start = '2012-01-01'
End = '2012-03-03'
use['Dates'] = pd.to_datetime(use['Dates'], dayfirst=True)
mask = (use['Dates'] > Start) & (use['Dates'] <= End) & (use['shift'] == 'aaa')
use1 = use.loc[mask]
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa
use1 = use.query('Dates > @Start and Dates <= @End and shift == "aaa"')
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa
print (mask.sum())
3
counts = use.loc[mask, 'Hours'].value_counts()
print (counts)
3 1
5 1
4 1
Name: Hours, dtype: int64
编辑:
Start = '2012-01-01'
End = '2012-03-03'
use['Dates'] = pd.to_datetime(use['Dates'], dayfirst=True)
mask = (use['Dates'] > Start) & (use['Dates'] <= End)
use1 = use.loc[mask]
print (use1)
User Dates Hours shift
1 User1 2012-01-02 5 aaa
2 User1 2012-01-03 2 bbb
3 User1 2012-01-04 3 aaa
6 User2 2012-02-04 4 aaa
7 User2 2012-02-05 3 bbb
counts = use1.groupby(['User','shift'])['Hours'].agg({'SUM':'sum', 'COUNT':'size'})
.reset_index()
print (counts)
User shift SUM COUNT
0 User1 aaa 8 2
1 User1 bbb 2 1
2 User2 aaa 4 1
3 User2 bbb 3 1
EDIT1:
如果需要更多条件,请使用loc
:
print(use.loc[(use['Date'] > '02.01.2012') & (use['Date'] < '02.05.2012'),'hours'].sum())
0
所有在一起:
use = pd.DataFrame({'Date': ['01.01.2012', '02.01.2012', '03.01.2012', '04.01.2012', '12.03.2012', '13.03.2012', '04.02.2012', '05.02.2012'], 'User': ['User1', 'User1', 'User1', 'User1', 'User1', 'User1', 'User2', 'User2'], 'hours': [5, 5, 2, 3, 1, 8, 4, 3], 'shift': ['aaa', 'aaa', 'bbb', 'aaa', 'aaa', 'ccc', 'aaa', 'bbb']})
print (use)
User Date hours shift
0 User1 01.01.2012 5 aaa
1 User1 02.01.2012 5 aaa
2 User1 03.01.2012 2 bbb
3 User1 04.01.2012 3 aaa
4 User1 12.03.2012 1 aaa
5 User1 13.03.2012 8 ccc
6 User2 04.02.2012 4 aaa
7 User2 05.02.2012 3 bbb
Start = '2012-01-01'
End = '2012-01-30'
User = 'User1'
shift = 'aaa'
use['Date'] = pd.to_datetime(use['Date'], dayfirst=True)
#how many Hours by dates (sum)
print(use.loc[(use['Date'] > Start) & (use['Date'] < End),'hours'].sum())
10
#how many Hours by dates and user (sum)
print(use.loc[(use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User),'hours'].sum())
10
#how many Hours by dates and user (count)
print(((use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User)).sum())
3
#how many Hours by dates and user and shift (count)
print(((use['Date'] > Start) & (use['Date'] < End) &
(use['User'] == User ) & (use['shift'] == shift)).sum())
2
答案 1 :(得分:0)
在运行聚合之前,您需要缩小数据集范围。
use[use['Dates'] == '01.01.2012']['hours'].sum()
该行的第一部分是过滤:
use[use['Dates'] == '01.01.2012']