计算两天之间每月的天数

时间:2019-01-11 15:03:59

标签: python date spss

我正在寻找一个函数,该函数接受2个日期(入院和出院)和一个财政年度,并返回这两个日期之间每个月的天数。

财政年度为4月1日-> 3月31日

我目前有一个下面的解决方案,它是一堆SPSS和Python,最终需要将其重新实现到SPSS中,但是作为一个更整洁的Python函数,不幸的是,这意味着它只能使用标准库(不能使用Pandas )。

例如

+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|    Admission    |    Discharge    |  FY  |  | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar |
+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| 01 January 2017 | 05 January 2017 | 1617 |  |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   4 |   0 |   0 |
| 01 January 2017 | 05 June 2017    | 1617 |  |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |  31 |  28 |  31 |
| 01 January 2017 | 05 June 2017    | 1718 |  |  30 |  31 |   4 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |
| 01 January 2017 | 01 January 2019 | 1718 |  |  30 |  31 |  30 |  31 |  31 |  30 |  31 |  30 |  31 |  31 |  28 |  31 |
+-----------------+-----------------+------+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

相关-How to calculate number of days between two given dates?

当前解决方案(SPSS代码)

 * Count the beddays.
 * Similar method to that used in Care homes.
 * 1) Declare an SPSS macro which will set the beddays for each month.
 * 2) Use python to run the macro with the correct parameters.
 * This means that different month lengths and leap years are handled correctly.
Define !BedDaysPerMonth (Month = !Tokens(1) 
   /MonthNum = !Tokens(1) 
   /DaysInMonth = !Tokens(1) 
   /Year = !Tokens(1))

 * Store the start and end date of the given month.
Compute #StartOfMonth = Date.DMY(1, !MonthNum, !Year).
Compute #EndOfMonth = Date.DMY(!DaysInMonth, !MonthNum, !Year).

 * Create the names of the variables e.g. April_beddays and April_cost.
!Let !BedDays = !Concat(!Month, "_beddays").

 * Create variables for the month.
Numeric !BedDays (F2.0).

 * Go through all possibilities to decide how many days to be allocated.
Do if keydate1_dateformat LE #StartOfMonth.
   Do if keydate2_dateformat GE #EndOfMonth.
      Compute !BedDays = !DaysInMonth.
   Else.
      Compute !BedDays = DateDiff(keydate2_dateformat, #StartOfMonth, "days").
   End If.
Else if keydate1_dateformat LE #EndOfMonth.
   Do if keydate2_dateformat GT #EndOfMonth.
      Compute !BedDays = DateDiff(#EndOfMonth, keydate1_dateformat, "days") + 1.
   Else.
      Compute !BedDays = DateDiff(keydate2_dateformat, keydate1_dateformat, "days").
   End If.
Else.
   Compute !BedDays = 0.
End If.

 * Months after the discharge date will end up with negatives.
If !BedDays < 0 !BedDays = 0.
!EndDefine.

 * This python program will call the macro for each month with the right variables.
 * They will also be in FY order.
Begin Program.
from calendar import month_name, monthrange
from datetime import date
import spss

#Set the financial year, this line reads the first variable ('year')
fin_year = int((int(spss.Cursor().fetchone()[0]) // 100) + 2000)

#This line generates a 'dictionary' which will hold all the info we need for each month
#month_name is a list of all the month names and just needs the number of the month
#(m < 4) + 2015 - This will set the year to be 2015 for April onwards and 2016 other wise
#monthrange takes a year and a month number and returns 2 numbers, the first and last day of the month, we only need the second.
months = {m: [month_name[m], (m < 4) + fin_year, monthrange((m < 4) + fin_year, m)[1]]  for m in range(1,13)}
print(months) #Print to the output window so you can see how it works

#This will make the output look a bit nicer
print("\n\n***This is the syntax that will be run:***")

#This loops over the months above but first sorts them by year, meaning they are in correct FY order
for month in sorted(months.items(), key=lambda x: x[1][1]):
   syntax = "!BedDaysPerMonth Month = " + month[1][0][:3]
   syntax += " MonthNum = " + str(month[0])
   syntax += " DaysInMonth = " + str(month[1][2])
   syntax += " Year = " + str(month[1][1]) + "."

   print(syntax)
   spss.Submit(syntax)
End Program.

6 个答案:

答案 0 :(得分:3)

我想到的唯一方法是遍历每一天并解析它所属的月份:

import time, collections
SECONDS_PER_DAY = 24 * 60 * 60
def monthlyBedDays(admission, discharge, fy=None):

    start = time.mktime(time.strptime(admission, '%d-%b-%Y'))
    end = time.mktime(time.strptime( discharge, '%d-%b-%Y'))
    if fy is not None:
        fy = str(fy)
        start = max(start, time.mktime(time.strptime('01-Apr-'+fy[:2], '%d-%b-%y')))
        end   = min(end,   time.mktime(time.strptime('31-Mar-'+fy[2:], '%d-%b-%y')))
    days = collections.defaultdict(int)
    for day in range(int(start), int(end) + SECONDS_PER_DAY, SECONDS_PER_DAY):
        day = time.localtime(day)
        key = time.strftime('%Y-%m', day)  # use '%b' to answer the question exactly, but that's not such a good idea
        days[ key ] += 1
    return days

output = monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018")
print(output)
# Prints:
# defaultdict(<class 'int'>, {'2018-01': 31, '2018-02': 28, '2018-03': 31, '2018-04': 25})

print(monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018", fy=1718))
# Prints:
# defaultdict(<class 'int'>, {'2018-01': 31, '2018-02': 28, '2018-03': 31})

print(monthlyBedDays(admission="01-Jan-2018", discharge="25-Apr-2018", fy=1819))
# Prints:
# defaultdict(<class 'int'>, {'2018-04': 25})

请注意,输出是defaultdict,因此,如果您要求输出任何月份中未记录的天数(或根本没有键)(例如output['1999-12'] ),它将返回0。还要注意,我已经使用'%Y-%m'格式作为输出键。与使用最初要求的密钥类型('%b'-> 'Jan')相比,这使对输出进行排序以及在不同年份发生的月份之间进行歧义化变得容易得多。

答案 1 :(得分:2)

首先,我建议使用datetime.date实例,因此您可以使用以下类似的方法来解析日期:

import datetime
date = datetime.datetime.strptime('17-Jan-2018', '%d-%b-%Y').date()

然后,您可以使用类似的方法在日期范围内进行迭代:

import datetime
import collections

def f(start_date, end_date, fy_str):
    # if the date range falls outside the financial year, cut it off
    fy_start = datetime.date(2000 + int(fy_str[:2]), 4, 1)
    if start_date < fy_start:
        start_date = fy_start
    fy_end = datetime.date(2000 + int(fy_str[2:]), 3, 31)
    if end_date > fy_end:
        end_date = fy_end

    month_dict = collections.defaultdict(int)

    date = start_date
    while date <= end_date:
        # the key holds year and month to make sorting easier
        key = '{}-{:02d}'.format(date.year, date.month)

        month_dict[key] += 1
        date += datetime.timedelta(days=1)

    return month_dict

用法如下:

>>> d1 = datetime.date(2018, 2, 5)
>>> d2 = datetime.date(2019, 1, 17)


>>> r = f(d1, d2, '1718')
>>> for k, v in sorted(r.items()):
...     print(k, v)
2018-02 24
2018-03 31

>>> r = f(d1, d2, '1819')
>>> for k, v in sorted(r.items()):
...     print(k, v)
2018-04 30
2018-05 31
2018-06 30
2018-07 31
2018-08 31
2018-09 30
2018-10 31
2018-11 30
2018-12 31
2019-01 17

答案 2 :(得分:1)

我认为很多人的答案是在OP提供fy如何发挥功能作用的关键信息之前(编辑:很多人都阅读了该修改,现在他们的答案也进行了更新)。 OP希望admissiondischarge之间的天数在该财政年度内生效(1819是2018年4月1日至2019年3月31日)。显然,众所周知,需要将天数除以日历月。

from datetime import datetime, timedelta

# Function taken from https://stackoverflow.com/a/13565185/9462009
def lastDateOfThisMonth(any_day):
    next_month = any_day.replace(day=28) + timedelta(days=4)
    return next_month - timedelta(days=next_month.day)

def monthlyBeddays(admission, discharge, fy):
    startFy = datetime.strptime('01-Apr-'+fy[:2], '%d-%b-%y')
    endFy = datetime.strptime('01-Apr-'+fy[2:], '%d-%b-%y')

    admissionDate = datetime.strptime(admission, '%d-%b-%Y')
    dischargeDate = datetime.strptime(discharge, '%d-%b-%Y')


    monthDates = {'Jan':0,'Feb':0,'Mar':0,'Apr':0,'May':0,'Jun':0,'Jul':0,'Aug':0,'Sep':0,'Oct':0,'Nov':0,'Dec':0}

    # if admitted after end of fy or discharged before beginning of fy, zero days counted
    if admissionDate > endFy or dischargeDate < startFy:
        return monthDates

    if admissionDate < startFy:
        # Jump ahead to start at the first day of fy if admission was prior to the beginning of fy
        now = startFy
    else:
        # If admission happened at or after the first day of fy, we begin counting from the admission date
        now = admissionDate

    while True:
        month = datetime.strftime(now,'%b')
        lastDateOfMonth = lastDateOfThisMonth(now)
        if now >= endFy:
            # If now is greater or equal to the first day of the next fy (endFy), we don't care about any of the following dates within the adm/dis date range
            break
        if month == datetime.strftime(dischargeDate,'%b') and datetime.strftime(now, '%Y') == datetime.strftime(dischargeDate, '%Y') and now >= startFy:
            # If we reach the discharge month, we count this month and we're done
            monthDates[month] = (dischargeDate - now).days # not adding one since in your example it seemed like you did not want to count the dischargeDate (Mar:4)
            break
        elif now < startFy:
            # If now is less than the first day of this fy (startFy), we move on from this month to the next month until we reach this fy
            pass
        else:
            # We are within this fy and have not reached the discharge month yet
            monthDates[month] = (lastDateOfMonth - now).days + 1
            month = datetime.strftime(now, '%b')
        now = lastDateOfMonth + timedelta(days=1) # advance to the 1st of the next month

    return monthDates

# Passes all six scenarios

# Scenario #1: admitted before fy, discharged before  fy (didn't stay at all during fy)
print(monthlyBeddays("01-Jan-2018", "30-Mar-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 0, 'May': 0, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #2: admitted before fy, discharged during fy
print(monthlyBeddays("01-Jan-2018", "30-May-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 30, 'May': 29, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #3: admitted during fy, discharged during fy
print(monthlyBeddays("15-Apr-2018", "30-May-2018", '1819')) # {'Jan': 0, 'Feb': 0, 'Mar': 0, 'Apr': 16, 'May': 29, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

# Scenario #4: admitted during fy, discharged after fy
print(monthlyBeddays("15-Apr-2018", "30-May-2019", '1819')) # {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 16, 'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31, 'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}

# Scenario #5: admitted before fy, discharged after fy (stayed the whole fy)
print(monthlyBeddays("15-Mar-2018", "30-May-2019", '1819')) # {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30, 'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31, 'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}

# Scenario #6: admitted after fy, discharged after fy (didn't stay at all during fy)
print(monthlyBeddays("15-Mar-2018", "30-May-2019", '1718')) # {'Jan': 0, 'Feb': 0, 'Mar': 17, 'Apr': 0, 'May': 0, 'Jun': 0, 'Jul': 0, 'Aug': 0, 'Sep': 0, 'Oct': 0, 'Nov': 0, 'Dec': 0}

答案 3 :(得分:0)

这是我提出的解决方案。据我了解,您想要两个给定日期之间每个月的天数。我尚未格式化月份(我将其保留为数字),但是您应该很容易做到这一点。

from datetime import date
from calendar import monthrange
from dateutil.relativedelta import *

#start and end dates
d0 = date(2008, 8, 18)
d1 = date(2008, 12, 26)
delta = d1 - d0
delta_days = delta.days #number of days between the two dates

#we create a copy of the start date so we can use it to iterate (so as to not to lose the initial date)
curr_d = d0
while(1):
    #we iterate over each month until we have no days left

    #if theere are more days in delta_days than in the month
    #the number of days in the current month is the maximum number of days in that month
    if delta_days > monthrange(curr_d.year, curr_d.month)[1]:
        number_of_days_in_curr_month = monthrange(curr_d.year, curr_d.month)[1]
        delta_days -= monthrange(curr_d.year, curr_d.month)[1]

    #the delta_days is smaller than the maximum days in the current month
    #the number of days in the current month is thus == to delta_days
    #we exit the while loop here
    else:
        number_of_days_in_curr_month = delta_days
        print('month number: ' + str(curr_d.month) + ', year: ' + str(curr_d.year) + ', days: ' + str(number_of_days_in_curr_month) )
        break
    print('month number: ' + str(curr_d.month) + ', year: ' + str(curr_d.year) + ', days: ' + str(number_of_days_in_curr_month) )

    #we increment the current month
    curr_d = curr_d + relativedelta(months=+1)

答案 4 :(得分:0)

仅使用核心库模块,需要几个月而不是几天的时间:

from calendar import monthrange, month_name
from datetime import datetime
from dateutil.relativedelta import relativedelta

def days_by_month(admission, discharge):
    #Returns a dictionary with months and count of days that fall into them

    def fin_year_check(start_month, x):

        if start_month >= 4:
            return 4 <= x <= 15
        if start_month < 4:
            return 1 <= x < 4

    def modulo(x):
        #modulo modified

        if x%12 == 0:
            return 12
        return x%12

    date_format = "%Y-%m-%d"

    admission_date = datetime.strptime(admission, date_format)
    discharge_date = datetime.strptime(discharge, date_format)

    year = admission_date.year
    start_day = admission_date.day
    start_month = admission_date.month
    end_day = discharge_date.day

    num_of_months = (relativedelta(discharge_date, admission_date).years * 12
                     + relativedelta(discharge_date, admission_date).months)
    days_in_first_month = monthrange(admission_date.year,admission_date.month)[1]-start_day
    days_in_last_month = end_day

    months = [month_name[modulo(x)] for x in 
                 range(admission_date.month, admission_date.month + num_of_months + 1)
                 if fin_year_check(start_month, x)]

    full_days = []

    for x in range(admission_date.month, admission_date.month + num_of_months):
        if fin_year_check(start_month, x):
            fin_year = year + 1 if x > 12 else year
            full_days.append(monthrange(fin_year, modulo(x))[1])

    all_days = [days_in_first_month, *full_days[1:], days_in_last_month]

    result = dict(zip(months, all_days))
    return result

一些样本测试:

days_by_month("2018-01-01", "2018-03-30")
#>>>{'January': 30, 'February': 28, 'March': 30}

days_by_month("2018-01-01", "2018-05-30")
#>>>{'January': 30, 'February': 28, 'March': 31}

days_by_month("2018-04-15", "2018-05-30")
#>>>{'April': 15, 'May': 30}

答案 5 :(得分:0)

感谢所有出色的回答。我尝试将其中的一些实现重新集成到SPSS中,但很快变得非常复杂,并且试图在两者之间传递值变得毫无意义...

我确实提出了一个整洁的函数,用于将SPSS日期变量解析为Python datetime对象:

from datetime import datetime, timedelta

def SPSS_to_Python_date(date):
    spss_start_date = datetime(1582, 10, 14)
    return (spss_start_date + timedelta(seconds = date))

关于主要问题,经过一番思考,我设法简化了(我认为)并提高了原始解决方案的稳定性。

Define !BedDaysPerMonth (Month_abbr = !Tokens(1) 
    /AdmissionVar = !Default(keydate1_dateformat) !Tokens(1)
    /DischargeVar = !Default(keydate2_dateformat) !Tokens(1)
    /DelayedDischarge = !Default(0) !Tokens(1))

 * Compute the month number from the name abbreviation.
Compute #MonthNum = xdate.Month(Number(!Quote(!Concat(!Month_abbr, "-00")), MOYR6)).

 * Find out which year we need e.g for FY 1718: Apr - Dec = 2018, Jan - Mar = 2018.
Do if (#MonthNum >= 4).
    Compute #Year = !Concat("20", !substr(!Unquote(!Eval(!FY)), 1, 2)).
Else.
    Compute #Year = !Concat("20", !substr(!Unquote(!Eval(!FY)), 3, 2)).
End if.

 * Now we have the year work out the start and end dates for the month.
Compute #StartOfMonth = Date.DMY(1, #MonthNum, #Year).
Compute #EndOfMonth = Date.DMY(1, #MonthNum + 1, #Year) - time.days(1).

 * Set the names of the variable for this month e.g. April_beddays.
 * And then create the variable.
!Let !BedDays = !Concat(!Month_abbr, "_beddays").
Numeric !BedDays (F2.0).

 * Go through all possibilities to decide how many days to be allocated.
Do if !AdmissionVar LE #StartOfMonth.
    Do if !DischargeVar GT #EndOfMonth.
        * They were in hospital throughout this month.
        * This will be the maximum number of days in the month.
        Compute !BedDays = DateDiff(#EndOfMonth, #StartOfMonth, "days") + 1.
    Else if !DischargeVar LE #StartOfMonth.
        * The whole record occurred before the month began.
        Compute !BedDays = 0.
    Else.
        * They were discharged at some point in the month.
        Compute !BedDays = DateDiff(!DischargeVar, #StartOfMonth, "days").
    End If.
 * If we're here they were admitted during the month.
Else if !AdmissionVar LE #EndOfMonth.
    Do if !DischargeVar GT #EndOfMonth.
        Compute !BedDays = DateDiff(#EndOfMonth, !AdmissionVar, "days") + 1.
    Else.
        * Admitted and discharged within this month.
        Compute !BedDays = DateDiff(!DischargeVar, !AdmissionVar, "days").
    End If.
Else.
    * They were admitted in a future month.
    Compute !BedDays = 0.
End If.

 * If we are looking at Delayed Discharge records, we should count the last day and not the first.
 * We achieve this by taking a day from the first month and adding it to the last.
!If (!DelayedDischarge = 1) !Then
    Do if xdate.Month(!AdmissionVar) = xdate.Month(date.MOYR(#MonthNum, #Year))
        and xdate.Year(!AdmissionVar) =  #Year.
        Compute !BedDays = !BedDays - 1.
    End if.

    Do if xdate.Month(!DischargeVar) = xdate.Month(date.MOYR(#MonthNum, #Year))
        and xdate.Year(!DischargeVar) =  #Year.
        Compute !BedDays = !BedDays + 1.
    End if.
!ifEnd.

 * Tidy up the variable.
Variable Width !Beddays (5).
Variable Level !Beddays (Scale).

!EndDefine.

然后可以(可选)使用Python的以下代码来运行。

from calendar import month_name
import spss

#Loop through the months by number in FY order
for month in (4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3):
   #To show what is happening print some stuff to the screen
   print(month, month_name[month])

   #Set up the syntax
   syntax = "!BedDaysPerMonth Month_abbr = " + month_name[month][:3]

   #print the syntax to the screen
   print(syntax)

   #run the syntax
   spss.Submit(syntax)