我试图了解大熊猫的序数规模(分类)是如何工作的。
import pandas as pd
import numpy as np
student = ["alex","bob","cynthia","daniel","evans"]
tshirt = ["L","XL","S","M","L"]
df = pd.DataFrame(data = tshirt, index=student)
df = df.rename(columns={0:"tshirt"})
tshirt
alex L
bob XL
cynthia S
daniel M
evans L
df = df["tshirt"].astype("category", categories = ["S","M","L","XL"],ordered = True)
当我尝试以下代码时,它显示为True。
df.loc["alex"] < df.loc["daniel"]
它应该是假的(因为L> M)
我的代码出了什么问题?
答案 0 :(得分:1)
首先,您的df
实际上是Series
...但无论如何,问题是您正在比较值,它们是字符串,具有固有排序(lexicographic),这就是Python正在做的事情。您需要以返回pandas
数据结构的方式选择数据:
In [2]: df[['alex']] < df[['daniel']]
Out[2]:
alex False
Name: tshirt, dtype: bool
或
In [3]: df.loc[['alex']] < df.loc[['daniel']]
Out[3]:
alex False
Name: tshirt, dtype: bool