我正在尝试自动化代码,因为它会导致重复性工作,因为它始终手动更改代码中的列名。我试图了解如何在SAS中创建一个宏变量,就像在SAS中一样。非常感谢任何帮助!
##I'm creating my cutoff points first
#### I need to assign a value to col1 in a macro in order not to hardcode it all the time!
cutoff1 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['10%'].astype('float64')
cutoff2 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['20%'].astype('float64')
cutoff3 = my_data['col1'].describe([.1,.2,.3,.4,.5,.6,.7, .8, 0.9])['30%'].astype('float64')
##Then I'm assigning the new values to my continuous variables by using the thresholds I've determined above
#### I also need to assign a value to COL1_RANK such as %s='COL1' i.e. %s&'_RANK'
def f(row):
if row['col1'] <=cutoff1 :
COL1_RANK = 1
elif row['col1']<=cutoff2:
COL1_RANK = 2
elif row['col1']<=cutoff3:
COL1_RANK = 3
else :
COL1_RANK = 4
return COL1_RANK
my_data['COL1_RANK'] = my_data.apply(f, axis=1)
my_data.head(5)
答案 0 :(得分:1)
def func(df, input_col, output_col):
cutoffs = df[input_col].quantile([.1,.2,.3]).astype('float64')
bins = [-np.inf, cutoffs[.1], cutoffs[.2], cutoffs[.3], np.inf]
labels=[1,2,3,4]
df[output_col] = pd.cut(df[input_col], bins=bins, labels=labels)
return df
my_data = func(my_data, 'col1', 'COL1_RANK')