Question

我有一些月度数据，其日期列的格式为：YYYY.fractional month。例如：

0    1960.500
1    1960.583
2    1960.667
3    1960.750
4    1960.833
5    1960.917

第一个指数是1960年6月（6/12 = .5），第二个指数是1960年7月（7/12 = .583），依此类推。

this question中的答案似乎并不适用，但我觉得pd.to_datetime应该能够以某种方式提供帮助。显然，我可以使用map将其拆分为组件并构建日期时间，但我希望更快，更严格的方法，因为数据很大。

Answer 1

我认为你需要一些数学：

a = df['date'].astype(int)
print (a)
0    1960
1    1960
2    1960
3    1960
4    1960
5    1960
Name: date, dtype: int32

b = df['date'].sub(a).add(1/12).mul(12).round(0).astype(int)
print (b)
0     7
1     8
2     9
3    10
4    11
5    12
Name: date, dtype: int32

c = pd.to_datetime(a.astype(str) + '.' + b.astype(str), format='%Y.%m')
print (c)
0   1960-07-01
1   1960-08-01
2   1960-09-01
3   1960-10-01
4   1960-11-01
5   1960-12-01
Name: date, dtype: datetime64[ns]

map的解决方案：

d = {'500':'7','583':'8','667':'9','750':'10','833':'11','917':'12'}

#if necessary
#df['date'] = df['date'].astype(str)
a = df['date'].str[:4]
b = df['date'].str[5:].map(d)

c = pd.to_datetime(a + '.' + b, format='%Y.%m')
print (c)
0   1960-07-01
1   1960-08-01
2   1960-09-01
3   1960-10-01
4   1960-11-01
5   1960-12-01
Name: date, dtype: datetime64[ns]

Answer 2

为了将来参考，这里是我之前使用的map。我实际上在这个问题上犯了一个错误;数据设置为1960年1月为1960.0，这意味着必须将1/12添加到每个小数部分。

def date_conv(d):
    y, frac_m = str(d).split('.')
    y = int(y)
    m = int(round((float('0.{}'.format(frac_m)) + 1/12) * 12, 0))
    d = 1
    try:
        date = datetime.datetime(year=y, month=m, day=d)
    except ValueError:
        print(y, m, frac_m)
        raise
    return date

dates_series = dates_series.map(lambda d: date_conv(d))

try / except块只是我在编写时添加的用于故障排除的内容。

熊猫：将小数月转换为日期时间

2 个答案: