我有一个多索引数据框。索引由ID和日期表示。我的3列分别是费用,收入和支出。
我想根据某些条件创建3个新列。
1)我要创建的第一个新列将基于条件,对于每个ID的最近3个日期,如果cost列持续减少,则将新行的值标记为“ NEG”,否则将其标记为“否”。
2)我要创建的第二列将基于条件,对于最近的3个日期,如果收入列持续减少,则将新行值标记为“ NEG”,否则将其标记为“没有'。
3)我要创建的第三列基于以下条件:对于最近的3个日期,如果支出列持续增加,请将新行值标记为“ POS”,或者保持不变新的行值为“ STABLE”。
idx = pd.MultiIndex.from_product([['001', '002', '003','004'],
['2017-06-30', '2017-12-31', '2018-06-30','2018-12-31','2019-06-30']],
names=['ID', 'Date'])
col = ['Cost', 'Revenue','Expenditure']
dict2 = {'Cost':[12,6,-2,-10,-16,-10,14,12,6,7,4,2,1,4,-4,5,7,9,8,1],
'Revenue':[14,13,2,1,-6,-10,14,12,6,7,4,2,1,4,-4,5,7,9,18,91],
'Expenditure':[17,196,20,1,-6,-10,14,12,6,7,4,2,1,4,-4,5,7,9,18,18]}
df = pd.DataFrame(dict2,idx,col)
我尝试创建一个函数,然后将其应用于我的DF,但是我一直收到错误消息...
我要最终解决的方案看起来像这样。
idx = pd.MultiIndex.from_product([['001', '002', '003','004'],
['2017-06-30', '2017-12-31', '2018-06-30','2018-12-31','2019-06-30']],
names=['ID', 'Date'])
col = ['Cost', 'Revenue','Expenditure', 'Cost Outlook', 'Revenue Outlook', 'Expenditure Outlook']
dict3= {'Cost': [12,6,-2,-10,-16,
-10,14,12,6,7,
4,2,1,4,-4,
5,7,9,8,1],
'Cost Outlook': ['no','no','NEG','NEG','NEG',
'no','no','no','NEG','NEG',
'no','no','NEG','no','no',
'no','no','no','no','NEG'],
'Revenue':[14,13,2,1,-6,
-10,14,12,6,7,
4,2,1,4,-4,
5,7,9,18,91],
'Revenue Outlook': ['no','no','NEG','NEG','NEG',
'no','no','no','NEG','NEG',
'no','no','NEG','no','no',
'no','no','no','no','no'],
'Expenditure':[17,196,1220,1220, -6,
-10,14,120,126,129,
4,2,1,4,-4,
5,7,9,18,18],
'Expenditure Outlook':['no','no','POS','POS','no',
'no','no','POS','POS','POS',
'no','no','no','no','no',
'no','no','POS','POS','STABLE']
}
df_new = pd.DataFrame(dict3,idx,col)
答案 0 :(得分:0)
这就是我要做的:
# update Cost and Revenue Outlooks
# because they have similar conditions
for col in ['Cost', 'Revenue']:
groups = df.groupby('ID')
outlook = f'{col} Outlook'
df[outlook] = groups[col].diff().lt(0)
# moved here
df[outlook] = np.where(groups[outlook].rolling(2).sum().eq(2), 'NEG', 'no')
# update Expenditure Outlook
col = 'Expenditure'
outlook = f'{col} Outlook'
s = df.groupby('ID')[col].diff()
df[outlook] = np.select( (s.eq(0).groupby(level=0).rolling(2).sum().eq(2),
s.gt(0).groupby(level=0).rolling(2).sum().eq(2)),
('STABLE', 'POS'), 'no')
答案 1 :(得分:0)
看看这是否能做到:
is_descending = lambda a: np.all(a[:-1] > a[1:])
is_ascending = lambda a: np.all(a[:-1] <= a[1:])
df1 = df.reset_index()
df1["CostOutlook"] = df1.groupby("ID").Cost.rolling(3).apply(is_descending).fillna(0).apply(lambda x: "NEG" if x > 0 else "no").to_list()
df1["RevenueOutlook"] = df1.groupby("ID").Revenue.rolling(3).apply(is_descending).fillna(0).apply(lambda x: "NEG" if x > 0 else "no").to_list()
df1["ExpenditureOutlook"] = df1.groupby("ID").Expenditure.rolling(3).apply(is_ascending).fillna(0).apply(lambda x: "POS" if x > 0 else "no").to_list()
df1 = df1.set_index(["ID", "Date"])
注意:“ STABLE”的要求未得到处理。
编辑: 这是替代解决方案:
is_descending = lambda a: np.all(a[:-1] > a[1:])
def is_ascending(a):
if np.all(a[:-1] <= a[1:]):
if a[-1] == a[-2]:
return 2
return 1
return 0
for col in ['Cost', 'Revenue']:
outlook = df[col].unstack(level="ID").rolling(3).apply(is_descending).fillna(0).replace({0.0:"no", 1.0:"NEG"}).unstack().rename(f"{col} outlook")
df = df.join(outlook)
col = "Expenditure"
outlook = df[col].unstack(level="ID").rolling(3).apply(is_ascending).fillna(0).replace({0.0:"no", 1.0:"POS", 2.0:"STABLE"}).unstack().rename(f"{col} outlook")
df = df.join(outlook)