Question

我正在尝试自动化代码，因为它会导致重复性工作，因为它始终手动更改代码中的列名。我试图了解如何在SAS中创建一个宏变量，就像在SAS中一样。非常感谢任何帮助！

##I'm creating my cutoff points first

#### I need to assign a value to col1 in a macro in order not to hardcode it all the time!

    cutoff1 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['10%'].astype('float64')
    cutoff2 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['20%'].astype('float64')
    cutoff3 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['30%'].astype('float64')

    ##Then I'm assigning the new values to my continuous variables by using the thresholds I've determined above

    #### I also need to assign a value to COL1_RANK such as %s='COL1' i.e.  %s&'_RANK'

    def f(row): 

                if row['col1'] <=cutoff1 : 
                        COL1_RANK = 1 

                elif row['col1']<=cutoff2: 
                        COL1_RANK = 2 

                elif row['col1']<=cutoff3: 
                        COL1_RANK = 3 

                 else : 
                        COL1_RANK = 4
                return COL1_RANK


    my_data['COL1_RANK'] = my_data.apply(f, axis=1) 

    my_data.head(5)

Answer 1

我认为您需要使用quantile和cut创建自定义功能：

def func(df, input_col, output_col):
    cutoffs = df[input_col].quantile([.1,.2,.3]).astype('float64')
    bins = [-np.inf, cutoffs[.1], cutoffs[.2], cutoffs[.3], np.inf]
    labels=[1,2,3,4]

    df[output_col] = pd.cut(df[input_col], bins=bins, labels=labels)
    return df

my_data = func(my_data, 'col1', 'COL1_RANK')

在python数据框中创建宏变量

1 个答案: