在完成所有处理之后,我能够创建以下数据帧。唯一的问题是年份不正确。每个位置的日期按递减顺序排列。因此,在2015-01-15
后,它应该是2014-12-15
,而不是2015-12-15
。
+--------------------+---------------+-------+
| Location | Date | Value |
+--------------------+---------------+-------+
| India | 2015-03-15 | -200|
| India | 2015-02-15 | 140 |
| India | 2015-01-15 | 155 |
| India | 2015-12-15 | 85 |
| India | 2015-11-15 | 45 |
| China | 2015-03-15 | 199 |
| China | 2015-02-15 | 164 |
| China | 2015-01-15 | 209 |
| China | 2015-12-15 | 24 |
| China | 2015-11-15 | 11 |
| Russia | 2015-03-15 | 48 |
| Russia | 2015-02-15 | 104 |
| Russia | 2015-01-15 | 106 |
| Russia | 2015-12-15 | -20 |
| Russia | 2015-11-15 | 10 |
答案 0 :(得分:2)
让强假设这些是每个月15日结束的月度日期,并且给定Location
的第一个值是正确的,我们可以每月向后退{{ 1}}。
Location
最终日期是字符串形式,您可能希望通过以下方式转换回时间戳:
# Create original dataframe.
df = pd.DataFrame({'Location': ['India'] * 5 + ['China'] * 5 + ['Russia'] * 5,
'Date': ['2015-03-15', '2015-02-15', '2015-01-15', '2015-12-15', '2015-11-15'] * 3,
'Value': [-200, 140, 155, 85, 45, 199, 164, 209, 24, 11, 48, 104, 106, -20, 10]})[
['Location', 'Date', 'Value']
]
# Convert dates to pandas Timestamps.
df['Date'] = pd.DatetimeIndex(df['Date'])
gb = df.groupby(['Location'])['Date']
df['Date'] = [
str(first_period - months) + '-15'
for location_months, first_period in zip(
gb.count(), gb.first().apply(lambda date: pd.Period(date, 'M')))
for months in range(location_months)
]
>>> df
Location Date Value
0 India 2015-03-15 -200
1 India 2015-02-15 140
2 India 2015-01-15 155
3 India 2014-12-15 85
4 India 2014-11-15 45
5 China 2015-03-15 199
6 China 2015-02-15 164
7 China 2015-01-15 209
8 China 2014-12-15 24
9 China 2014-11-15 11
10 Russia 2015-03-15 48
11 Russia 2015-02-15 104
12 Russia 2015-01-15 106
13 Russia 2014-12-15 -20
14 Russia 2014-11-15 10
答案 1 :(得分:2)
您必须在pandas数据框中迭代日期系列,如下所示,并检查上一个日期是否为1月,从日期减去一年(365天)。
from dateutil.relativedelta import relativedelta
for idx, date in df['Date'].iteritems()[1:]:
if df['Date'].iloc[idx-1].month == 1:
date = date - relativedelta(years=1)
# date = date - pd.DateOffset(years=1)
编辑:relativedelta会遇到闰年,或者你可以在那个地方使用
pd.DateOffset(years=1)
。
希望它有所帮助!
答案 2 :(得分:1)
如果你不介意使用循环,你可以这样做 -
import pandas as pd
dt = ["2015-03-15", "2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15",
"2015-03-15", "2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15", "2015-03-15",
"2015-02-15", "2015-01-15", "2015-12-15", "2015-11-15"]
df = pd.DataFrame(dt,columns=['dt'])
cntry = ['India', 'China', 'Russia']*5
cntry.sort()
df.loc[:,'country'] = cntry
collect = []
for cntry in df.country.unique().tolist():
# print(cntry)
year_ = 0
i = 0
for dt in df.loc[df.country == cntry,'dt']:
# print(df.loc[df.country == cntry,'dt'].iloc[i,], str(int(dt[:4])+year_)+dt[4:])
collect.append(str(int(dt[:4])+year_)+dt[4:])
if int(dt[5:7]) == 1:
year_-=1
i+=1
df.loc[:,'dt'] = collect