我有一个数据帧index_crisis
,并且想要创建一个新列,当索引达到本地峰值时该列包含1,而其他则为零。
我不知道如何在我的代码中继续。列表peak locations
为:
[2 7 9 13 16 18 21],但有了month[peak_locations]
,我得到了高峰月份。
Date Index
38 2007-06-01 -0.56
39 2007-07-01 -0.36
40 2007-08-01 0.68
41 2007-09-01 0.24
42 2007-10-01 0.22
43 2007-11-01 0.89
44 2007-12-01 0.95
45 2008-01-01 1.53
46 2008-02-01 1.01
47 2008-03-01 1.73
48 2008-04-01 1.39
49 2008-05-01 0.96
50 2008-06-01 1.26
51 2008-07-01 2.37
52 2008-08-01 1.57
53 2008-09-01 2.95
54 2008-10-01 5.7
55 2008-11-01 5.29
56 2008-12-01 5.42
57 2009-01-01 4.99
58 2009-02-01 4.45
59 2009-03-01 4.59
60 2009-04-01 4.2
61 2009-05-01 3.12
62 2009-06-01 1.85
我的预期输出是dummy
列,如下所示:
0
0
1
0
0
0
0
1
0
1
0
0
0
1
0
0
1
0
1
0
0
1
0
0
0
df = pd.read_csv("index_crisis.csv", parse_dates=True)
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = pd.PeriodIndex(df.Date, freq='M').strftime("%b %Y")
data = df['Index'].values
doublediff = np.diff(np.sign(np.diff(data)))
peak_locations = np.where(doublediff == -2)[0] + 1
答案 0 :(得分:1)
idx = df.iloc[peak_locations].index
df['dummy'] = np.where(df.index.isin(idx), 1, 0)
Date Index dummy
38 Jun 2007 -0.56 0
39 Jul 2007 -0.36 0
40 Aug 2007 0.68 1
41 Sep 2007 0.24 0
42 Oct 2007 0.22 0
43 Nov 2007 0.89 0
44 Dec 2007 0.95 0
45 Jan 2008 1.53 1
46 Feb 2008 1.01 0
47 Mar 2008 1.73 1
48 Apr 2008 1.39 0
49 May 2008 0.96 0
50 Jun 2008 1.26 0
51 Jul 2008 2.37 1
52 Aug 2008 1.57 0
53 Sep 2008 2.95 0
54 Oct 2008 5.7 1
55 Nov 2008 5.29 0
56 Dec 2008 5.42 1
57 Jan 2009 4.99 0
58 Feb 2009 4.45 0
59 Mar 2009 4.59 1
60 Apr 2009 4.2 0
61 May 2009 3.12 0
62 Jun 2009 1.85 0
答案 1 :(得分:0)
通过以下方式找到局部最大值:该值大于下一个且下一个值不大于下一个:
series = df['Index'].values
s = series > series.shift(1)
df[s & (s != s.shift(-1))]
答案 2 :(得分:0)
我认为您想使用Numpy的花式索引构建您的1和0数组。序列对象可用作Numpy数组的索引器。
按照您的示例,假设您的DataFrame长62行。然后:
>>> peak_locations = [2, 7, 9, 13, 16, 18, 21] # You generated this
>>> dummy = np.zeros(len(df), dtype=int) # I assume length 62 in this example
>>> print(dummy)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> dummy[peak_locations] = 1 # This is the fancy indexing hotness
>>> dummy
array([0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> df["dummy"] = dummy # Adds the new column