熊猫分类不按预期工作

时间:2017-01-25 01:51:47

标签: python pandas

我试图了解大熊猫的序数规模(分类)是如何工作的。

import pandas as pd
import numpy as np

student = ["alex","bob","cynthia","daniel","evans"]
tshirt = ["L","XL","S","M","L"]
df = pd.DataFrame(data = tshirt, index=student)
df = df.rename(columns={0:"tshirt"})




       tshirt
  alex    L
  bob     XL
  cynthia S
  daniel  M
  evans   L

df = df["tshirt"].astype("category", categories = ["S","M","L","XL"],ordered = True)

当我尝试以下代码时,它显示为True。

df.loc["alex"] < df.loc["daniel"]

它应该是假的(因为L> M)

我的代码出了什么问题?

1 个答案:

答案 0 :(得分:1)

首先,您的df实际上是Series ...但无论如何,问题是您正在比较,它们是字符串,具有固有排序(lexicographic),这就是Python正在做的事情。您需要以返回pandas数据结构的方式选择数据:

In [2]: df[['alex']] < df[['daniel']]
Out[2]:
alex    False
Name: tshirt, dtype: bool

In [3]: df.loc[['alex']] < df.loc[['daniel']]
Out[3]:
alex    False
Name: tshirt, dtype: bool