我的目标是将一列从df1转移到df2,同时创建bin。我有一个名为df1的数据框,其中包含3个数字变量。我想获取一个名为'tenure'的变量到df2并想创建bins,它将列值传输到df2但df2显示了一些缺失的值。 请在下面找到代码:
df2=pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'])
在创建df2之前,我检查了df1中是否缺少值。没有那么令人着迷的值,但是在创建垃圾箱之后,它会显示11个缺失值。
print(df2.isnull().sum())
以上代码显示11个缺失值
感谢Anyones的帮助。
答案 0 :(得分:1)
我假设您在df1['tenure']
中有一些不在(0,80]
中的值,也许是零。请参见下面的示例:
df1 = pd.DataFrame({'tenure':[-1, 0, 12, 34, 78, 80, 85]})
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high']))
0 NaN # -1 is lower than 0 so result is null
1 NaN # it was 0 but the segment is open on the lowest bound so 0 gives null
2 low
3 medium
4 high
5 high # 80 is kept as the segment is closed on the right
6 NaN # 85 is higher than 80 so result is null
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]
现在,您可以在include_lowest=True
中传递参数pd.cut
来保持结果的左边界:
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'],
include_lowest=True))
0 NaN
1 low # now where the value was 0 you get low and not null
2 low
3 medium
4 high
5 high
6 NaN
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]
所以最后,我认为如果您打印len(df1[(df1.tenure <= 0) | (df1.tenure > 80)])
,您的数据将得到11,作为null
中df2
值的数目(这里是3)