Question

问题：如何替换数据框中的所有特定int64值，但避免错误地替换不等的int32值。

当提供大的int64值时，Dataframe错误地替换int32值。下面我创建了 minimal 示例，其中我想将具有大值的所有字段替换为-1。鉴于所有数据均为零，不应更新任何内容。然而，专栏＆＃39; a＆＃39;替换后变为-1

import pandas
import numpy
dtype = [('a','int32'), ('b','int64'), ('c','float32')]
index = ['x', 'y']
columns = ['a','b','c']
values = numpy.zeros(2, dtype=dtype)
df2 = pandas.DataFrame(values, index=index)
df2.replace(-9223372036854775808, -1)

输出是：

     a  b     c
x   -1  0   0.0
y   -1  0   0.0

编辑：

看起来numpy转换类型已关闭，但问题仍然是如何在数据帧转换中避免它？注意：-9223372036854775808是HEX 8000000000000000

x = numpy.array(-9223372036854775808, dtype='int64')
print('as int32: ', x.astype(numpy.int32))
#produces
#('as int32: ', array(0, dtype=int32))

Answer 1

您正确地发现问题是由类型缩小引起的。为什么不替换那些具有匹配或至少足够宽的数据类型的列？

Code.load_file

dataframe replace防止在提供int64值时替换不正确的int32字段

1 个答案: