使用pandas的数据集中的新字段

时间:2015-03-10 15:52:51

标签: python pandas

我正在尝试从使用SAS转换到Python,我很享受。我找到了一个很好的SQL to Pandas指南,这对我有很大帮助,但我在SAS做的一些事情我不确定如何在Pandas中做:

/*  looks at variable within a dataset and assigns it to high, med, low */

data scores;

set scores;

if score_value >= 80 then score_cat = "high";

else if score_value >= 50 then score_cat = "med";

else score_cat = "low";

run;



/*  looks at the value of a particular variable and deletes the record */

data people;

set people;

if trim(name)="" then delete;

run;

我确定这很容易做到,但我现在还没有看到它。 Ť

谢谢! JT

1 个答案:

答案 0 :(得分:1)

  • 要对值进行分类,您可以使用 pandas.cut

  • 要删除包含空字符串的行,请创建一个布尔值掩码,例如df['people'] != '',并选择df[...]df.loc[...]行:


import numpy as np
import pandas as pd

df = pd.DataFrame({'score':[0,1,49,50,80,81,100],
                   'people':['', 'A', 'B', '', 'D', '', 'F']})
#   people  score
# 0             0
# 1      A      1
# 2      B     49
# 3            50
# 4      D     80
# 5            81
# 6      F    100


df['cat'] = pd.cut(df['score'], bins=[0,50,80,100], include_lowest=True,
                   labels=['low', 'med', 'high'])

df = df[df['people'] != '']
print(df)

产量

  people  score   cat
1      A      1   low
2      B     49   low
4      D     80   med
6      F    100  high