Question

我正在创建一个小的Pandas DataFrame，并向其中添加一些应该是整数的数据。但是，即使我非常努力地将dtype显式设置为int并仅提供int值，它始终最终变成浮点数。这对我完全没有意义，而且行为甚至看起来也不完全一致。

考虑以下Python脚本：

import pandas as pd

df = pd.DataFrame(columns=["col1", "col2"])  # No dtype specified.
print(df.dtypes)  # dtypes are object, since there is no information yet.
df.loc["row1", :] = int(0)  # Add integer data.
print(df.dtypes)  # Both columns have now become int64, as expected.
df.loc["row2", :] = int(0)  # Add more integer data.
print(df.dtypes)  # Both columns are now float64???
print(df)  # Shows as 0.0.

# Let's try again, but be more specific.
del df  
df = pd.DataFrame(columns=["col1", "col2"], dtype=int)  # Explicit set dtype.
print(df.dtypes)  # For some reason both colums are already float64???
df.loc["row1", :] = int(0)
print(df.dtypes)  # Both colums still float64.

# Output:
"""
col1    object
col2    object
dtype: object
col1    int64
col2    int64
dtype: object
col1    float64
col2    float64
dtype: object
      col1  col2
row1   0.0   0.0
row2   0.0   0.0
col1    float64
col2    float64
dtype: object
col1    float64
col2    float64
dtype: object
"""

我可以通过在最后进行df = df.astype(int)来解决它。还有其他修复方法。但这不是必需的。我试图找出我做错了什么，从而使这些列首先浮于水面。

这是怎么回事？

Python版本3.7.1 熊猫0.23.4版

编辑：

我认为也许有人误会了。此DataFrame中永远没有NaN值。创建后立即如下所示：

Empty DataFrame
Columns: [col1, col2]
Index: []

这是一个 empty 数据框，df.shape = 0，但是其中没有NaN，还没有行。

我还发现了更糟的东西。即使我在添加数据使其成为int之后执行df = df.astype(int)，一旦我添加更多数据，它就会再次变得浮动！

df = pd.DataFrame(columns=["col1", "col2"], dtype=int)
df.loc["row1", :] = int(0)
df.loc["row2", :] = int(0)
df = df.astype(int)  # Force it back to int.
print(df.dtypes)  # It is now ints again.
df.loc["row3", :] = int(0)  # Add another integer row.
print(df.dtypes)  # It is now float again???

# Output:
"""
col1    int32
col2    int32
dtype: object
col1    float64
col2    float64
dtype: object
"""

suggested fix in version 0.24似乎与我的问题无关。该功能与Nullable Integer数据类型有关。我的数据中没有NaN或None值。

Answer 1

df.loc["rowX"] = int(0)将起作用并解决问题中提出的问题。 df.loc["rowX",:] = int(0)不起作用。真是惊讶

df.loc["rowX"] = int(0)提供了在保留所需dtype的同时填充空数据帧的功能。但是一个人一次可以整行。

df.loc["rowX"] = [np.int64(0), np.int64(1)]有效。

.loc[]适用于基于https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html的基于标签的分配。注意：0.24文档未描述用于插入新行的.loc []。

文档显示了使用.loc[]以列敏感方式通过分配添加行。但在DataFrame填充数据的地方这样做。

但是在空框架上切片时会变得很奇怪。

import pandas as pd
import numpy as np
import sys

print(sys.version)
print(pd.__version__)

print("int dtypes preserved")
# append on populated DataFrame
df = pd.DataFrame([[0, 0], [1,1]], index=['a', 'b'], columns=["col1", "col2"])
df.loc["c"] = np.int64(0)
# slice existing rows
df.loc["a":"c"] = np.int64(1)
df.loc["a":"c", "col1":"col2":1] = np.int64(2)
print(df.dtypes)

# no selection AND no data, remains np.int64 if defined as such
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc[:, "col1":"col2":1] = np.int64(0)
df.loc[:,:] = np.int64(0)
print(df.dtypes)

# and works if no index but data
df = pd.DataFrame([[0, 0], [1,1]], columns=["col1", "col2"])
df.loc[:,"col1":"col2":1] = np.int64(0)
print(df.dtypes)

# the surprise... label based insertion for the entire row does not convert to float
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc["a"] = np.int64(0)
print(df.dtypes)

# a surprise because referring to all columns, as above, does convert to float
print("unexpectedly converted to float dtypes")
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc["a", "col1":"col2"] = np.int64(0)
print(df.dtypes)

3.7.2 (default, Mar 19 2019, 10:33:22) 
[Clang 10.0.0 (clang-1000.11.45.5)]
0.24.2
int dtypes preserved
col1    int64
col2    int64
dtype: object
col1    int64
col2    int64
dtype: object
col1    int64
col2    int64
dtype: object
col1    int64
col2    int64
dtype: object
unexpectedly converted to float dtypes
col1    float64
col2    float64
dtype: object

如何阻止Pandas DataFrame无缘无故地将int转换为float？

1 个答案: