如果前一个月是1月,减去一年

时间:2017-08-24 08:32:23

标签: python pandas

在完成所有处理之后,我能够创建以下数据帧。唯一的问题是年份不正确。每个位置的日期按递减顺序排列。因此,在2015-01-15后,它应该是2014-12-15,而不是2015-12-15

+--------------------+---------------+-------+
|   Location         | Date          | Value |
+--------------------+---------------+-------+
| India              | 2015-03-15    |   -200|
| India              | 2015-02-15    |  140  |
| India              | 2015-01-15    |  155  |
| India              | 2015-12-15    |   85  |
| India              | 2015-11-15    |   45  |
| China              | 2015-03-15    |   199 |
| China              | 2015-02-15    |  164  |
| China              | 2015-01-15    |  209  |
| China              | 2015-12-15    |   24  |
| China              | 2015-11-15    |   11  |
| Russia             | 2015-03-15    |   48  |
| Russia             | 2015-02-15    |  104  |
| Russia             | 2015-01-15    |  106  |
| Russia             | 2015-12-15    |   -20 |
| Russia             | 2015-11-15    |   10  |

3 个答案:

答案 0 :(得分:2)

假设这些是每个月15日结束的月度日期,并且给定Location的第一个值是正确的,我们可以每月向后退{{ 1}}。

Location

最终日期是字符串形式,您可能希望通过以下方式转换回时间戳:

# Create original dataframe.
df = pd.DataFrame({'Location': ['India'] * 5 + ['China'] * 5 + ['Russia'] * 5,
                   'Date': ['2015-03-15', '2015-02-15', '2015-01-15', '2015-12-15', '2015-11-15'] * 3,
                   'Value': [-200, 140, 155, 85, 45, 199, 164, 209, 24, 11, 48, 104, 106, -20, 10]})[
    ['Location', 'Date', 'Value']
]
# Convert dates to pandas Timestamps.
df['Date'] = pd.DatetimeIndex(df['Date'])

gb = df.groupby(['Location'])['Date']
df['Date'] = [
    str(first_period - months) + '-15'
     for location_months, first_period in zip(
         gb.count(), gb.first().apply(lambda date: pd.Period(date, 'M'))) 
     for months in range(location_months)
]
>>> df
   Location        Date  Value
0     India  2015-03-15   -200
1     India  2015-02-15    140
2     India  2015-01-15    155
3     India  2014-12-15     85
4     India  2014-11-15     45
5     China  2015-03-15    199
6     China  2015-02-15    164
7     China  2015-01-15    209
8     China  2014-12-15     24
9     China  2014-11-15     11
10   Russia  2015-03-15     48
11   Russia  2015-02-15    104
12   Russia  2015-01-15    106
13   Russia  2014-12-15    -20
14   Russia  2014-11-15     10

答案 1 :(得分:2)

您必须在pandas数据框中迭代日期系列,如下所示,并检查上一个日期是否为1月,从日期减去一年(365天)。

from dateutil.relativedelta import  relativedelta

for idx, date in df['Date'].iteritems()[1:]:
    if df['Date'].iloc[idx-1].month == 1:
        date = date - relativedelta(years=1)
        # date = date - pd.DateOffset(years=1)
  

编辑:relativedelta会遇到闰年,或者你可以在那个地方使用pd.DateOffset(years=1)

希望它有所帮助!

答案 2 :(得分:1)

如果你不介意使用循环,你可以这样做 -

import pandas as pd

dt = ["2015-03-15", "2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15", 
"2015-03-15", "2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15", "2015-03-15", 
"2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15"]

df = pd.DataFrame(dt,columns=['dt'])
cntry = ['India', 'China', 'Russia']*5
cntry.sort()
df.loc[:,'country'] = cntry

collect = []
for cntry in df.country.unique().tolist():
    # print(cntry)
    year_ = 0
    i = 0
    for dt in df.loc[df.country == cntry,'dt']:
        # print(df.loc[df.country == cntry,'dt'].iloc[i,], str(int(dt[:4])+year_)+dt[4:])
        collect.append(str(int(dt[:4])+year_)+dt[4:])
        if int(dt[5:7]) == 1:
            year_-=1    
        i+=1

df.loc[:,'dt'] = collect