Question

我有一个奇怪的数据集：

   year   firms  age  survival
0  1977  564918    0       NaN
2  1978  503991    0       NaN
3  1978  413130    1  0.731310
5  1979  497805    0       NaN
6  1979  390352    1  0.774522

我将前三列的dtype转换为整数：

>>> df.dtypes
year          int64
firms         int64
age           int64
survival    float64

但是现在我想在这里根据索引搜索另一个表：

idx = 331
otherDf.loc[df.loc[idx, 'age']]
Traceback (most recent call last):
(...)
KeyError: 8.0

这来自

df.loc[idx, 'age']
8.0

为什么这会继续返回浮点值？我如何在otherDf中执行查找？我在pandas版本0.15。

Answer 1

您返回浮动因为每行包含float和int类型的混合。选择具有loc的行索引后，整数将转换为浮点数：

>>> df.loc[4]
year          1979.000000
firms       390352.000000
age              1.000000
survival         0.774522
Name: 4, dtype: float64

因此，在age选择df.loc[4, 'age']条目会产生1.0。

要解决此问题并返回一个整数，您可以使用loc从age列中进行选择，而不是整个DataFrame：

>>> df['age'].loc[4]
1

Answer 2

这是pandas版本0.19中的一个错误。它似乎已在0.20版本中得到修复。比照https://github.com/pandas-dev/pandas/issues/11617

Answer 3

你必须使用loc吗？那怎么样：

otherDf.loc(df['age'][idx])

通过'年龄'Series抓取值会返回相应的类型（int64）

Answer 4

我无法用Pandas 0.15.1重现这种行为。

>>> pd.__version__
'0.15.1'
>>> df = pd.DataFrame({"age": [1,8]})
>>> df
   age
0    1
1    8
>>> df.dtypes
age    int64
dtype: object
>>> df.loc[1, "age"]
8
>>> type(df.loc[1, "age"])
<type 'numpy.int64'>

我无法在更改日志中找到相关条目，但我们可能想知道您是使用0.15.0还是更新的。

修改

添加另一个具有float类型的列确实会使行数据类型规范化为float（正如ajcr在他的回答中指出的那样）：

>>> df = pd.DataFrame({"age": [1, 8], "greatness": [0.2, 1.7]}) >>> type(df.loc[1, "age"]) <type 'numpy.float64'>

Answer 5

现在，当您需要单个值时，可以使用df.at[idx, 'age']。

dtype：integer，但loc返回float

5 个答案: