Question

我得到了一个用浮点数索引的熊猫多索引DataFrame。考虑以下示例：

arrays = [[0.21,0.21,0.21,0.22,0.22,0.22,0.23,0.23,0.23],
          [0.81,0.8200000000000001,0.83,0.81,0.8200000000000001,0.83,0.81,0.8200000000000001,0.83]]
df = pd.DataFrame(np.random.randn(9, 2), index=arrays)

df

#               0           1
# 0.21  0.81    -2.234036   -0.145643
#       0.82    0.367248    -1.471617
#       0.83    -0.764520   0.686241
# 0.22  0.81    1.380429    1.546513
#       0.82    1.230707    1.826980
#       0.83    -1.198403   0.377323
# 0.23  0.81    -0.418367   -0.125763
#       0.82    0.682860    -0.119080
#       0.83    -1.802418   0.357573

以这种形式给我这个DataFrame。现在，如果我要检索条目df.loc[(0.21, 0.82)]，则会收到错误消息，因为索引实际上并不包含0.82而是0.8200000000000001。我事先不知道这些问题在索引中出现的位置。我该如何解决这个问题？我的想法是将多重索引的两个级别都舍入为有效的小数位数，在这种情况下为2。但是那怎么办呢？有更好的解决方案吗？

Answer 1

考虑改用整数：将浮点数乘以100（或1000）并转换为整数：

df.index = pd.MultiIndex.from_product([
             (df.index.levels[0] * 100).astype(int),
             (df.index.levels[1] * 100).astype(int)])

与浮点数不同，整数是精确的。现在，您可以使用df.loc[(21, 82)]来访问数据。

Answer 2

您可以使用rename函数将函数应用于MultiIndex的每个值：

df = df.rename(index=lambda val: round(val, 2))

print(df.loc[(.21, .82)])
0    0.260015
1   -0.233822
Name: (0.21, 0.82), dtype: float64

但是由于https://docs.python.org/3/tutorial/floatingpoint.html，我不确定是否将浮点数作为特定键（简要示例）

>>> .1 + .1 + .1 == .3
False

尽管我很好奇别人对此的看法。因为我不确定您可能遇到的现实问题。

您始终可以将浮点数截断为字符串，然后通过字符串访问数据框以确保准确性：

df = df.rename(index="{:.2f}".format)

print(df.loc[("0.21", "0.82")]) # note that the leading 0 is important here now
0    0.260015
1   -0.233822
Name: (0.21, 0.82), dtype: float64

熊猫多索引DataFrame中的圆形浮点

2 个答案: