我正在查看python发现过程中的这一无人机租赁数据集,并试图GroupBy
结果列显示每架无人机每个月的产量。
如果结果与特定日期相关联,我通常可以这样做,但是由于这是一项长期租赁业务,因此我需要计算出结果的多少可归因于开始日期和结束日期之间的每个月。
+------+------------------+------------------+--------+
| Drone| Start | End | Result |
+------+------------------+------------------+--------+
| DR1 16/06/2013 10:30 22/08/2013 07:00 2786 |
| DR1 20/04/2013 23:30 16/06/2013 10:30 7126 |
| DR1 24/01/2013 23:00 20/04/2013 23:30 2964 |
| DR2 01/03/2014 19:00 07/05/2014 18:00 8884 |
| DR2 04/09/2015 09:00 04/11/2015 07:00 7828 |
| DR2 04/10/2013 05:00 24/12/2013 07:00 5700 |
+-----------------------------------------------------+
我可以使用以下方法找到日期的差异:
import datetime
from dateutil.relativedelta import relativedelta
df.Start = pd.to_datetime(df.Start)
df.End = pd.to_datetime(df.End)
a = df.loc[0, 'Start']
b = df.loc[0, 'End']
relativedelta(a,b)
但是输出结果如下:
相对delta(月= -2,天= -5,小时= -20,分钟= -30)
并且我无法使用它使用GroupBy来计算现金归因,就像数据集只有一个日期一样
df.groupby(['Device', 'Date']).agg(sum)['Result']
对于解决此类问题的正确思考过程以及代码的外观,我将提供一些帮助。
以每种无人机类型的第一个示例为例, 我的预期输出将是:
+------+-------+-------+---------+
|Drone | Month | Days | Result |
+------+-------+-------+---------+
|DR1 June X $YY |
|DR1 July X $YY |
|DR1 August X $YY |
|DR2 March Y $ZZ |
|DR2 April Y $ZZ |
|DR2 May Y $ZZ |
+--------------------------------+
谢谢
答案 0 :(得分:3)
这是一个循环的解决方案,但我认为它可以满足您的要求。
# Just load the sample data
from io import StringIO
data = 'Drone,Start,End,Result\n' + \
'DR1,16/06/2013 10:30,22/08/2013 07:00,2786\n' + \
'DR1,20/04/2013 23:30,16/06/2013 10:30,7126\n' + \
'DR1,24/01/2013 23:00,20/04/2013 23:30,2964\n' + \
'DR2,01/03/2014 19:00,07/05/2014 18:00,8884\n' + \
'DR2,04/09/2015 09:00,04/11/2015 07:00,7828\n' + \
'DR2,04/10/2013 05:00,24/12/2013 07:00,5700\n'
stream = StringIO(data)
# Actual solution
import pandas as pd
from datetime import datetime
df = pd.read_csv(stream, sep=',', parse_dates=[1, 2])
def get_month_spans(row):
month_spans = []
start = row['Start']
total_delta = (row['End'] - row['Start']).total_seconds()
while row['End'] > start:
if start.month != 12:
end = datetime(year=start.year, month=start.month+1, day=1)
else:
end = datetime(year=start.year+1, month=1, day=1)
if end > row['End']:
end = row['End']
delta = (end - start).total_seconds()
proportional = row['Result'] * (delta / total_delta)
month_spans.append({'Drone': row['Drone'],
'Month': datetime(year=start.year,
month=start.month,
day=1),
'Result': proportional,
'Days': delta / (24 * 3600)})
start = end
print(delta)
return month_spans
month_spans = []
for index, row in df.iterrows():
month_spans += get_month_spans(row)
monthly = pd.DataFrame(month_spans).groupby(['Drone', 'Month']).agg(sum)[['Result', 'Days']]
print(monthly)
哪个会输出每个无人机每月的产量以及天数:
Result Days
Drone Month
DR1 2013-01-01 242.633083 7.041667
2013-02-01 964.789537 28.000000
2013-03-01 1068.159845 31.000000
2013-04-01 1953.216797 30.000000
2013-05-01 3912.726199 31.000000
2013-06-01 2555.334620 30.000000
2013-07-01 1291.856653 31.000000
2013-08-01 887.283266 21.291667
DR2 2013-04-01 459.202454 20.791667
2013-05-01 684.662577 31.000000
2013-06-01 662.576687 30.000000
2013-07-01 684.662577 31.000000
2013-08-01 684.662577 31.000000
2013-09-01 662.576687 30.000000
2013-10-01 684.662577 31.000000
2013-11-01 662.576687 30.000000
2013-12-01 514.417178 23.291667
2014-01-01 1369.726258 28.208333
2014-02-01 1359.610112 28.000000
2014-03-01 1505.282624 31.000000
2014-04-01 1456.725120 30.000000
2014-05-01 1505.282624 31.000000
2014-06-01 1456.725120 30.000000
2014-07-01 230.648144 4.750000
2015-04-01 7828.000000 1.916667