熊猫:我怎样才能生成“年 - 月”格式栏(期间)?

时间:2017-09-19 22:34:41

标签: python pandas dataframe

In [20]: df.head()
Out[20]:
   year  month     capital       sales      income      profit         debt 
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0

我尝试的是:

df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")

但是发生了错误:

In [21]: df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m"
    ...: )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-31bfca8c5941> in <module>()
----> 1 df['date'] = pd.to_datetime(df['year']*10000 + df['month']*100, format="%Y%m")

~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
    507     elif isinstance(arg, ABCSeries):
    508         from pandas import Series
--> 509         values = _convert_listlike(arg._values, False, format)
    510         result = Series(values, index=arg.index, name=arg.name)
    511     elif isinstance(arg, (ABCDataFrame, MutableMapping)):

~/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/tools/datetimes.py in _convert_listlike(arg, box, format, name, tz)
    412                     try:
    413                         result = tslib.array_strptime(arg, format, exact=exact,
--> 414                                                       errors=errors)
    415                     except tslib.OutOfBoundsDatetime:
    416                         if errors == 'raise':

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_strptime (pandas/_libs/tslib.c:63753)()

TypeError: 'int' object is unsliceable

我认为这是因为错过了'日'。

我该怎么做?

1 个答案:

答案 0 :(得分:1)

In [223]: df['date'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.to_period('M')

In [224]: df
Out[224]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06

In [208]: df['date'] = pd.PeriodIndex(pd.to_datetime(df[['year','month']].assign(day=1)),
                                      freq='M')

In [209]: df
Out[209]:
   year  month     capital       sales      income      profit         debt    date
0  2000      6 -19250379.0  37924704.0  -4348337.0   2571738.0  192842551.0 2000-06
1  2000     12 -68357153.0  27870187.0 -49074146.0 -20764204.0  190848380.0 2000-12
2  2001      6 -65048960.0  30529435.0  -1172803.0   2000427.0  197383572.0 2001-06
3  2001     12 -90129943.0  17135480.0 -24208501.0  -1012230.0  191464941.0 2001-12
4  2002      6  14671980.0  31377347.0   2188125.0   3660938.0  101355088.0 2002-06