我有一个[0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1].
0: indicates economic increase.
1: indicates economic decline.
经济衰退是连续两次下降(1)。
经济衰退的结束是连续两次增加(0)。
在上面的数据集中,我有两次经济衰退,从指数3开始,在指数5结束,在指数11结束时从指数8结束。
我对如何用熊猫来解决这个问题感到迷茫。我想确定经济衰退开始和结束的指数。任何帮助将不胜感激。
这是我对soln的python尝试。
np_decline = np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
recession_start_flag = 0
recession_end_flag = 0
recession_start = []
recession_end = []
for i in range(len(np_decline) - 1):
if recession_start_flag == 0 and np_decline[i] == 1 and np_decline[i + 1] == 1:
recession_start.append(i)
recession_start_flag = 1
if recession_start_flag == 1 and np_decline[i] == 0 and np_decline[i + 1] == 0:
recession_end.append(i - 1)
recession_start_flag = 0
print(recession_start)
print(recession_end)
这是一种更加以熊猫为中心的方法吗? 莱昂
答案 0 :(得分:4)
您可以使用shift
:
df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1], columns=['signal'])
df_prev = df.shift(1)['signal']
df_next = df.shift(-1)['signal']
df_next2 = df.shift(-2)['signal']
df.loc[(df_prev != 1) & (df['signal'] == 1) & (df_next == 1), 'start'] = 1
df.loc[(df['signal'] != 0) & (df_next == 0) & (df_next2 == 0), 'end'] = 1
df.fillna(0, inplace=True)
df = df.astype(int)
signal start end
0 0 0 0
1 1 0 0
2 0 0 0
3 1 1 0
4 1 0 0
5 1 0 1
6 0 0 0
7 0 0 0
8 1 1 0
9 1 0 0
10 0 0 0
11 1 0 1
12 0 0 0
13 0 0 0
14 1 0 0
答案 1 :(得分:4)
使用rolling(2)
s = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
我减去.5
,因此当经济衰退开始时rolling
总和为1
,而当它停止时为-1
。
s2 = s.sub(.5).rolling(2).sum()
由于1
和-1
都评估为True
,因此我可以屏蔽滚动信号以启动和停止ffill
。通过gt(0)
获取正面或负面的真值。
pd.concat([s, s2.mask(~s2.astype(bool)).ffill().gt(0)], axis=1, keys=['signal', 'isRec'])
答案 2 :(得分:4)
使用Guid guid = Guid.NewGuid();
byte[] guidBytes = guid.ToByteArray();
// Is the result (uniqueId) as unique as guid.ToString()?
string uniqueId = string.Join(string.Empty, guidBytes);
的类似想法,但将结果写为单个布尔列:
shift
结果输出:
# Boolean indexers for recession start and stops.
rec_start = (df['signal'] == 1) & (df['signal'].shift(-1) == 1)
rec_end = (df['signal'] == 0) & (df['signal'].shift(-1) == 0)
# Mark the recession start/stops as True/False.
df.loc[rec_start, 'recession'] = True
df.loc[rec_end, 'recession'] = False
# Forward fill the recession column with the last known Boolean.
# Fill any NaN's as False (i.e. locations before the first start/stop).
df['recession'] = df['recession'].ffill().fillna(False)
答案 3 :(得分:3)
1的运行开始满足条件
x_prev = x.shift(1)
x_next = x.shift(-1)
((x_prev != 1) & (x == 1) & (x_next == 1))
也就是说,运行开始时的值为1,前一个值不为1,下一个值为1.同样,运行结束时满足条件
((x == 1) & (x_next == 0) & (x_next2 == 0))
因为运行结束时的值是1,接下来的两个值是0。
我们可以使用np.flatnonzero
找到满足这些条件的索引:
import numpy as np
import pandas as pd
x = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
x_prev = x.shift(1)
x_next = x.shift(-1)
x_next2 = x.shift(-2)
df = pd.DataFrame(
dict(start = np.flatnonzero((x_prev != 1) & (x == 1) & (x_next == 1)),
end = np.flatnonzero((x == 1) & (x_next == 0) & (x_next2 == 0))))
print(df[['start', 'end']])
产量
start end
0 3 5
1 8 11
答案 4 :(得分:0)
您可以使用scipy.signal.find_peaks解决此问题。
from scipy.signal import find_peaks
np_decline = np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
peaks = find_peaks(np_decline,width=2)
recession_start_loc = peaks[1]['left_bases'][0]