我的数据框看起来像这样:
Major Sample_size Men Women ShareWomen Employed ... Part_time
Economics 36 2057 282 0.120564 1976 ... 270
French 7 679 77 0.101852 640 ... 170
我试图按如下方式定义一个函数:
def cutoff(category, cut, direction):
if direction == 0:
comply = list(zip(df[df.category < cut].Major, df[df.category < cut].category))
if direction == 1:
comply = list(zip(df[df.category > cut].Major, df[df.category > cut].category))
return comply
其中category
表示感兴趣的变量(例如Men
或Employed
或Part_time
)。但我似乎无法以这种方式将category
称为输入变量。怎么会这样做?
答案 0 :(得分:1)
您可以使用df[category]
def cutoff(category, cut, direction):
if direction == 0:
comply = list(zip(df[df[category] < cut].Major, df[df[category] < cut][category]))
if direction == 1:
comply = list(zip(df[df[category] > cut].Major, df[df[category] > cut][category]))
return comply
答案 1 :(得分:1)
您可以按属性或项目访问熊猫框架。必须在编码时知道属性,项必须是字符串,因此可以是变量。
df.Major
VS
df['Major']
我建议,将切断和转换分成列表:
def cutoff(df, category, cut, direction):
mask = df[category] < cut if direction == 0 else df[category] > cut
return df[mask]
def get_list(df, category):
return list(zip(df.Major, df[category]))
get_list(cutoff(df, 'Employed', 1000, 1))