如何根据特定条件创建新列?

时间:2019-10-03 17:33:15

标签: python-3.x pandas dataframe multi-index

我有一个多索引数据框。索引由ID和日期表示。我的3列分别是费用,收入和支出。

我想根据某些条件创建3个新列。

1)我要创建的第一个新列将基于条件,对于每个ID的最近3个日期,如果cost列持续减少,则将新行的值标记为“ NEG”,否则将其标记为“否”。

2)我要创建的第二列将基于条件,对于最近的3个日期,如果收入列持续减少,则将新行值标记为“ NEG”,否则将其标记为“没有'。

3)我要创建的第三列基于以下条件:对于最近的3个日期,如果支出列持续增加,请将新行值标记为“ POS”,或者保持不变新的行值为“ STABLE”。

idx = pd.MultiIndex.from_product([['001', '002', '003','004'],
                              ['2017-06-30', '2017-12-31', '2018-06-30','2018-12-31','2019-06-30']],
                             names=['ID', 'Date'])
col = ['Cost', 'Revenue','Expenditure']

 dict2 = {'Cost':[12,6,-2,-10,-16,-10,14,12,6,7,4,2,1,4,-4,5,7,9,8,1],
     'Revenue':[14,13,2,1,-6,-10,14,12,6,7,4,2,1,4,-4,5,7,9,18,91],
     'Expenditure':[17,196,20,1,-6,-10,14,12,6,7,4,2,1,4,-4,5,7,9,18,18]}

df = pd.DataFrame(dict2,idx,col)

我尝试创建一个函数,然后将其应用于我的DF,但是我一直收到错误消息...

我要最终解决的方案看起来像这样。

idx = pd.MultiIndex.from_product([['001', '002', '003','004'],
                              ['2017-06-30', '2017-12-31', '2018-06-30','2018-12-31','2019-06-30']],
                             names=['ID', 'Date'])
col = ['Cost', 'Revenue','Expenditure', 'Cost Outlook', 'Revenue Outlook', 'Expenditure Outlook']

dict3= {'Cost':  [12,6,-2,-10,-16,
            -10,14,12,6,7,
            4,2,1,4,-4,
            5,7,9,8,1],


    'Cost Outlook':   ['no','no','NEG','NEG','NEG', 
                       'no','no','no','NEG','NEG', 
                       'no','no','NEG','no','no', 
                       'no','no','no','no','NEG'],



    'Revenue':[14,13,2,1,-6,
               -10,14,12,6,7,
               4,2,1,4,-4,
               5,7,9,18,91],

    'Revenue Outlook': ['no','no','NEG','NEG','NEG', 
                        'no','no','no','NEG','NEG', 
                        'no','no','NEG','no','no', 
                        'no','no','no','no','no'],



    'Expenditure':[17,196,1220,1220, -6,
                   -10,14,120,126,129, 
                   4,2,1,4,-4,
                   5,7,9,18,18],


    'Expenditure Outlook':['no','no','POS','POS','no', 
                           'no','no','POS','POS','POS', 
                           'no','no','no','no','no', 
                           'no','no','POS','POS','STABLE']
   }

df_new  = pd.DataFrame(dict3,idx,col)

2 个答案:

答案 0 :(得分:0)

这就是我要做的:

# update Cost and Revenue Outlooks 
# because they have similar conditions
for col in ['Cost', 'Revenue']:
    groups = df.groupby('ID')

    outlook = f'{col} Outlook'
    df[outlook] = groups[col].diff().lt(0)

    # moved here
    df[outlook] = np.where(groups[outlook].rolling(2).sum().eq(2), 'NEG', 'no')

# update Expenditure Outlook
col = 'Expenditure'
outlook = f'{col} Outlook'
s = df.groupby('ID')[col].diff()

df[outlook] = np.select( (s.eq(0).groupby(level=0).rolling(2).sum().eq(2),
                          s.gt(0).groupby(level=0).rolling(2).sum().eq(2)),
                        ('STABLE', 'POS'), 'no')

答案 1 :(得分:0)

看看这是否能做到:

is_descending = lambda a: np.all(a[:-1] > a[1:])
is_ascending = lambda a: np.all(a[:-1] <= a[1:])
df1 = df.reset_index()
df1["CostOutlook"] = df1.groupby("ID").Cost.rolling(3).apply(is_descending).fillna(0).apply(lambda x: "NEG" if x > 0 else "no").to_list()
df1["RevenueOutlook"] = df1.groupby("ID").Revenue.rolling(3).apply(is_descending).fillna(0).apply(lambda x: "NEG" if x > 0 else "no").to_list()
df1["ExpenditureOutlook"] = df1.groupby("ID").Expenditure.rolling(3).apply(is_ascending).fillna(0).apply(lambda x: "POS" if x > 0 else "no").to_list()
df1 = df1.set_index(["ID", "Date"])

注意:“ STABLE”的要求未得到处理。

编辑: 这是替代解决方案:

is_descending = lambda a: np.all(a[:-1] > a[1:])
def is_ascending(a):
    if np.all(a[:-1] <= a[1:]):
        if a[-1] == a[-2]:
            return 2
        return 1
    return 0

for col in ['Cost', 'Revenue']:
    outlook = df[col].unstack(level="ID").rolling(3).apply(is_descending).fillna(0).replace({0.0:"no", 1.0:"NEG"}).unstack().rename(f"{col} outlook")
    df = df.join(outlook)

col = "Expenditure"
outlook = df[col].unstack(level="ID").rolling(3).apply(is_ascending).fillna(0).replace({0.0:"no", 1.0:"POS", 2.0:"STABLE"}).unstack().rename(f"{col} outlook")
df = df.join(outlook)