优化从2-dim numpy数组到年日值(doy)到具有日期值的数组的转换

时间:2016-10-13 15:00:13

标签: python arrays datetime

有没有人知道如何优化从2-dim numpy数组的转换,将一年中的某天值(doy)转换为 一个包含日期值的数组? 下面的功能有效但不幸的是以一种非常不优雅的方式。如果有人有个好主意,我会很高兴 如何避免2-dim数组上的循环,这应该使大型日期集的计算更快。

import datetime
from datetime import date

#test 2-dim array with doy values
doy = np.array([[272, 272],
                [274, 274]])

#define start and end date
startdat = datetime.datetime.strptime('2012 10 01 0000', '%Y %m %d %H%M')
year_start = int(startdat.strftime('%Y'))
enddat = datetime.datetime.strptime('2013 09 30 0000', '%Y %m %d %H%M')
year_end = int(enddat.strftime('%Y'))

#initialise an tmp array
res_date = np.zeros([2,2]) 

#transform doy into date    
for x in range(2):
    for y in range(2):
        if doy[x,y] >= 274 and doy[x,y] <= 365:
            datum = date.fromordinal(date(year_start, 1, 1).toordinal() + doy[x,y])
            datum = datum.strftime('%Y%m%d')    
            res_date[x,y]= datum
        else:
            datum = date.fromordinal(date(year_end, 1, 1).toordinal() + doy[x,y])
            datum = datum.strftime('%Y%m%d')    
            res_date[x,y]= datum
#that's my result
#res_date = array([[ 20130930.,  20130930.],
                  #[ 20121001.,  20121001.]])  

3 个答案:

答案 0 :(得分:0)

你可以做这种事

offset = (datetime.datetime(2013, 9, 30) - datetime.datetime(2012, 12, 31)).days
yearlen = (datetime.datetime(2013, 1, 1) - datetime.datetime(2012, 1, 1)).days
doy[doy >= offset] -= yearlen
dates = np.datetime64('2013-01-01') + doy

但是从datetime64值中提取YMD有点棘手。共识是使用大熊猫。为什么你需要数组采用那种格式?

编辑,我已经添加了计算年份,但我没有想过所有的排列,你可能需要用日历来检查它!

进一步编辑。从你下一个问题的措辞来看,你的家伙看起来就像2011,9,30(或2011,10,1减1)。即。

import numpy as np
import datetime

#counting the doys from the 1. of October to the 30 of September
doy = np.array([[0, 4],
                [7, 93]])

#read start and enddat
startdat = datetime.datetime(2011,10,1)
dates = np.datetime64(startdat).astype('datetime64[D]') + doy - 1

# NB I don't think the datetime64 format takes too much space 'inside' numpy
# it just looks bulky. From the name I would assume it uses a 64 bit integer
# which is only 8 bytes for each value in memory (standard integers are 32 bits
#
#... but if you want to convert to floats this is a rather ugly way of doing it
# NB it's specifically 2D array and rather undoes the whole point of using numpy!
dates = np.array([[d.year * 10000 + d.month * 100 + d.day for d in c]
                                 for c in dates.tolist()], dtype=np.float)
print('raw version\n', dates)
dates[doy == 0] = np.nan
print('nan version\n', dates)

#raw version
# [[ 20110930.  20111004.]
# [ 20111007.  20120101.]]
#nan version
# [[       nan  20111004.]
# [ 20111007.  20120101.]]

答案 1 :(得分:0)

sry for awnsering so late. I hoped I could find a way to go around all this datetime transfromation. But unfortunately there is no way to go around and I'm still confused with all this transfromations, dayshifts and leap years. My Problem is still the same: How to get a corect date from my doy-time-values, which starts counting a year from the 1. October until the 30. of September. So in order to get a correct date out of this doy-time-values I tried to correct the offset first. The last time I gave you the corrected offset values and you gave ma an elegant code to convert the correced doys into a date. Secondly I tried to convert the corrected doys into a date. Below you find the the whole code including your part. The code works well for non leap years. But I still dont know how to handle the leap years and to get a correct date even for leap years. Mabey you have an idea how to handel the leap years and mabey there is better way for the offset correction. Yet my dates within a leap year including an offset of 2 days. Oh this datetime transformations are really a bit confusing. Would be great if you have an idea how to handle this problem. Thank you so far!

convert doy into date

#import modules
import numpy as np
import datetime
import copy

#test_data
#counting the doys from the 1. of October to the 30 of September
doy = np.array([[152, 4],
                [7, 93]])

#read start and enddat
startdat = datetime.datetime.strptime('2011 10 01 0000', '%Y %m %d %H%M')
enddat = datetime.datetime.strptime('2012 09 30 0000', '%Y %m %d %H%M')
year_startdat = int(startdat.strftime('%Y'))
year_enddat = int(enddat.strftime('%Y'))
yeardays = (enddat - startdat) + datetime.timedelta(days=1)

#correct the doy offset in order to transfrom doy into a date
doy_corr = copy.copy(start_max)
if yeardays == datetime.timedelta(366):
    print 'is leap year!'
    doy_corr[(doy >=1.) & (doy <= 92)] += 274
    doy_corr[(doy >=93.) & (doy <= 366)] -= 93
#correct the doy offset if there is no leap year
else:
    print 'no leap year!'    
    doy_corr[(doy >=1.) & (doy <= 92)] += 273
    doy_corr[(doy >=93.) & (doy <= 365)] -= 93

#transform doy corrected into date. The offset is necessary to get the correct year
offset = (datetime.datetime(year_enddat, 9, 30) - datetime.datetime(year_startdat, 12, 31)).days
yearlen = (datetime.datetime(year_enddat, 1, 1) - datetime.datetime(year_startdat, 1, 1)).days
doy_corr[doy_corr >= offset] -= yearlen
dates = np.datetime64(str(year_enddat)+('-01-01')) + doy_corr

#my result should be 
#array([['2012-02-29', '2011-10-04'],
#      ['2011-10-07', '2012-01-01']], dtype='datetime64[D]')

答案 2 :(得分:0)

如何在具有NA值的np.datetime数组中替换零(零值代表缺失值)以及如何将np.datetime64数组转换为float或int数组?

#import modules
import numpy as np
import datetime

#counting the doys from the 1. of October to the 30 of September
#Zero stands for NA
doy = np.array([[0, 4],
                [7, 93]])

#define to startdat to receive dates from 1. of October 
startdat = datetime.datetime(2011,10,1)
dates = np.datetime64(startdat).astype('datetime64[D]') + doy - 1
print(dates)
#convert the datetime array into a string
dates_str = np.datetime_as_string(dates)
#replace the false date-values with NA
ind = np.where(dates_str == '2011-09-30')
dates_str[ind] = 'NA'

#My favored result:
#array([[nan, 20111004.],
#       [20111007., '20120101.]], 
#      dtype='float')