pandas使用另一列的值转移日期

时间:2016-08-16 18:24:11

标签: python date datetime pandas time-series

制作测试数据的代码:

 import pandas as pd
 df = pd.DataFrame({'A': pd.date_range(start='1-1-2016',periods=5, freq='M')})
 df['B'] = df.A.dt.month
 print(df)

数据看起来像

   B          A
0  1     2016-01-31
1  2     2016-02-29
2  3     2016-03-31
3  4     2016-04-30
4  5     2016-05-31

如何将列A向后移动月数作为列B的值

的效果
 df['A'] - pd.DateOffset(months=value_from_column_B)

2 个答案:

答案 0 :(得分:3)

您可以尝试:

df['C'] = df[['A', 'B']].apply(lambda x: x['A'] - pd.DateOffset(months=x['B']), axis=1)

答案 1 :(得分:3)

这是一种组合日期数组(NumPy datetime64 s)的矢量化方法 日期组件(例如年,月,日):

import numpy as np
import pandas as pd

def compose_date(years, months=1, days=1, weeks=None, hours=None, minutes=None,
                 seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

df = pd.DataFrame({'A': pd.date_range(start='1-1-2016',periods=5, freq='M')})
df['B'] = df['A'].dt.month
df['C'] = compose_date(years=df['A'].dt.year, 
                       months=df['A'].dt.month-df['B'], 
                       days=df['A'].dt.day)
print(df)
#            A  B          C
# 0 2016-01-31  1 2015-12-31
# 1 2016-02-29  2 2015-12-29
# 2 2016-03-31  3 2015-12-31
# 3 2016-04-30  4 2015-12-30
# 4 2016-05-31  5 2015-12-31
In [135]: df = pd.DataFrame({'A': pd.date_range(start='1-1-2016', periods=10**3, freq='M')})

In [136]: df['B'] = df['A'].dt.month

In [137]: %timeit compose_date(years=df['A'].dt.year, months=df['A'].dt.month-df['B'], days=df['A'].dt.day)
10 loops, best of 3: 41.2 ms per loop

In [138]: %timeit df[['A', 'B']].apply(lambda x: x['A'] - pd.DateOffset(months=x['B']), axis=1)
10 loops, best of 3: 169 ms per loop