数据。CSV
ID Activity Month Activity Date
0 04/2019 04-01-2019
1 05/2019 05-13-2019
2 05/2019 05-25-2019
3 06/2019 06-10-2019
4 06/2019 06-19-2019
5 07/2019 07-15-2019
6 07/2019 07-18-2019
7 07/2019 07-29-2019
8 08/2019 06-03-2019
9 08/2019 06-15-2019
10 08/2019 06-20-2019
我的计划
阅读csv:
df = pd.read_csv('data.CSV')
转换为日期时间:
df ['活动日期'] = pd.to_datetime(df ['活动日期'],dayfirst = True)
按“活动月份”列分组:
grouped = df.groupby(['活动月份'])['活动日期'] .count()
打印(分组)
Activity Month
04/2019 15532
05/2019 13924
06/2019 12822
07/2019 14067
08/2019 10939
Name: Activity Date, dtype: int64
将日期分组时,执行工作日计算:
这部分我不确定该怎么做。已经丢失
我用来计算工作日的代码
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
我想要的输出
Total business days for month (4) is 22 days
Total business days for month (5) is 23 days
Total business days for month (6) is 20 days
Total business days for month (7) is 23 days
Total business days for month (8) is 22 days
答案 0 :(得分:1)
这里我并不清楚问题的陈述,但是,如果您要计算每个Activity Month
的工作日数,可以将计算结果包装在一个方法中,然后将该方法应用于{ {1}}列(Activity Month
表达式基本上是指定列的每一行的for循环操作)。
lambda
输出是具有文本输出的系列。
grouped = df.groupby(['Activity Month'])['Activity Date'].count().reset_index()
def get_business_days(x):
x = datetime.date(int(x.split('/')[1]), int(x.split('/')[0]), 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
return ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
grouped['Activity Month'].apply(get_business_days)
但是,在每个单元格中存储重复的信息是一个坏主意。最好只返回0 Total business days for month (4) is 22 days
1 Total business days for month (5) is 23 days
2 Total business days for month (6) is 20 days
3 Total business days for month (7) is 23 days
4 Total business days for month (8) is 22 days
而不是将其嵌入字符串中。