我正在尝试从使用SAS转换到Python,我很享受。我找到了一个很好的SQL to Pandas指南,这对我有很大帮助,但我在SAS做的一些事情我不确定如何在Pandas中做:
/* looks at variable within a dataset and assigns it to high, med, low */
data scores;
set scores;
if score_value >= 80 then score_cat = "high";
else if score_value >= 50 then score_cat = "med";
else score_cat = "low";
run;
/* looks at the value of a particular variable and deletes the record */
data people;
set people;
if trim(name)="" then delete;
run;
我确定这很容易做到,但我现在还没有看到它。 Ť
谢谢! JT
答案 0 :(得分:1)
要对值进行分类,您可以使用 pandas.cut。
要删除包含空字符串的行,请创建一个布尔值掩码,例如df['people'] != ''
,并选择df[...]
或df.loc[...]
行:
import numpy as np
import pandas as pd
df = pd.DataFrame({'score':[0,1,49,50,80,81,100],
'people':['', 'A', 'B', '', 'D', '', 'F']})
# people score
# 0 0
# 1 A 1
# 2 B 49
# 3 50
# 4 D 80
# 5 81
# 6 F 100
df['cat'] = pd.cut(df['score'], bins=[0,50,80,100], include_lowest=True,
labels=['low', 'med', 'high'])
df = df[df['people'] != '']
print(df)
产量
people score cat
1 A 1 low
2 B 49 low
4 D 80 med
6 F 100 high