Question

我的csv文件只有几个长度为20的数字数据，当我在数据帧中读取它时，它将作为dtype对象读取。我需要将所有数字数据转换为Integer。

我的数据是csv的样子：

emp_id,age,salary,marital
21012334509821345944,22,4500,married
21012334509821345945,22,4510,single
21012334509821345946,22,45040,married
21012334509821345947,22,41500,single
21012334509821345948,22,54500,single
21012334509821345949,22,64500,married

我尝试过：

d1 = pd.read_csv('D:\\Exercise\\test.csv')
d1.set_index('emp_id',inplace = True)
d1.index = d1.index.map(int) #OverflowError: int too big to convert
print(d1.index.values)

如果我评论索引图，我将得到如下输出： ['21012334509821345944''21012334509821345945''21012334509821345946' '21012334509821345947''21012334509821345948''21012334509821345949']

但是我需要整数。我尝试单独铸造第一列。如果具有数值，是否可以将数据框中的所有数据强制转换。我尝试使用numpy转换。我遇到同样的错误。谢谢。

Answer 1

可以由整数（np.uint64）表示的最大值是18446744073709551615。因此，可能您将无法做到这一点。

Answer 2

Pandas / Numpy将整数保留为64位。也许更大，但是重点是有限的。您需要将它们存储为dtype object，但将值存储为int。

这是一种方式：

df.emp_id.values[:] = [*map(int, df.emp_id)]

那你就可以做数学了。

df.emp_id // int(1e10)

0    2101233450
1    2101233450
2    2101233450
3    2101233450
4    2101233450
5    2101233450
Name: emp_id, dtype: object

它不会优化数学，但应该可以。

在数据框中转换为长度为20的数字数据

2 个答案: