我正在创建一个小的Pandas DataFrame,并向其中添加一些应该是整数的数据。但是,即使我非常努力地将dtype显式设置为int并仅提供int值,它始终最终变成浮点数。这对我完全没有意义,而且行为甚至看起来也不完全一致。
考虑以下Python脚本:
import pandas as pd
df = pd.DataFrame(columns=["col1", "col2"]) # No dtype specified.
print(df.dtypes) # dtypes are object, since there is no information yet.
df.loc["row1", :] = int(0) # Add integer data.
print(df.dtypes) # Both columns have now become int64, as expected.
df.loc["row2", :] = int(0) # Add more integer data.
print(df.dtypes) # Both columns are now float64???
print(df) # Shows as 0.0.
# Let's try again, but be more specific.
del df
df = pd.DataFrame(columns=["col1", "col2"], dtype=int) # Explicit set dtype.
print(df.dtypes) # For some reason both colums are already float64???
df.loc["row1", :] = int(0)
print(df.dtypes) # Both colums still float64.
# Output:
"""
col1 object
col2 object
dtype: object
col1 int64
col2 int64
dtype: object
col1 float64
col2 float64
dtype: object
col1 col2
row1 0.0 0.0
row2 0.0 0.0
col1 float64
col2 float64
dtype: object
col1 float64
col2 float64
dtype: object
"""
我可以通过在最后进行df = df.astype(int)
来解决它。还有其他修复方法。但这不是必需的。我试图找出我做错了什么,从而使这些列首先浮于水面。
这是怎么回事?
Python版本3.7.1 熊猫0.23.4版
编辑:
我认为也许有人误会了。此DataFrame中永远没有NaN值。创建后立即如下所示:
Empty DataFrame
Columns: [col1, col2]
Index: []
这是一个 empty 数据框,df.shape = 0,但是其中没有NaN,还没有行。
我还发现了更糟的东西。即使我在添加数据使其成为int之后执行df = df.astype(int)
,一旦我添加更多数据,它就会再次变得浮动!
df = pd.DataFrame(columns=["col1", "col2"], dtype=int)
df.loc["row1", :] = int(0)
df.loc["row2", :] = int(0)
df = df.astype(int) # Force it back to int.
print(df.dtypes) # It is now ints again.
df.loc["row3", :] = int(0) # Add another integer row.
print(df.dtypes) # It is now float again???
# Output:
"""
col1 int32
col2 int32
dtype: object
col1 float64
col2 float64
dtype: object
"""
suggested fix in version 0.24似乎与我的问题无关。该功能与Nullable Integer数据类型有关。我的数据中没有NaN或None值。
答案 0 :(得分:1)
df.loc["rowX"] = int(0)
将起作用并解决问题中提出的问题。 df.loc["rowX",:] = int(0)
不起作用。真是惊讶
df.loc["rowX"] = int(0)
提供了在保留所需dtype的同时填充空数据帧的功能。但是一个人一次可以整行。
df.loc["rowX"] = [np.int64(0), np.int64(1)]
有效。
.loc[]
适用于基于https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html的基于标签的分配。注意:0.24文档未描述用于插入新行的.loc []。
文档显示了使用.loc[]
以列敏感方式通过分配添加行。但在DataFrame
填充数据的地方这样做。
但是在空框架上切片时会变得很奇怪。
import pandas as pd
import numpy as np
import sys
print(sys.version)
print(pd.__version__)
print("int dtypes preserved")
# append on populated DataFrame
df = pd.DataFrame([[0, 0], [1,1]], index=['a', 'b'], columns=["col1", "col2"])
df.loc["c"] = np.int64(0)
# slice existing rows
df.loc["a":"c"] = np.int64(1)
df.loc["a":"c", "col1":"col2":1] = np.int64(2)
print(df.dtypes)
# no selection AND no data, remains np.int64 if defined as such
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc[:, "col1":"col2":1] = np.int64(0)
df.loc[:,:] = np.int64(0)
print(df.dtypes)
# and works if no index but data
df = pd.DataFrame([[0, 0], [1,1]], columns=["col1", "col2"])
df.loc[:,"col1":"col2":1] = np.int64(0)
print(df.dtypes)
# the surprise... label based insertion for the entire row does not convert to float
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc["a"] = np.int64(0)
print(df.dtypes)
# a surprise because referring to all columns, as above, does convert to float
print("unexpectedly converted to float dtypes")
df = pd.DataFrame(columns=["col1", "col2"], dtype=np.int64)
df.loc["a", "col1":"col2"] = np.int64(0)
print(df.dtypes)
3.7.2 (default, Mar 19 2019, 10:33:22)
[Clang 10.0.0 (clang-1000.11.45.5)]
0.24.2
int dtypes preserved
col1 int64
col2 int64
dtype: object
col1 int64
col2 int64
dtype: object
col1 int64
col2 int64
dtype: object
col1 int64
col2 int64
dtype: object
unexpectedly converted to float dtypes
col1 float64
col2 float64
dtype: object