Question

我对熊猫还比较陌生，在尝试定义dtypes以读取大文件时，出现以下错误：NameError: name 'int64' is not defined。

我确保已安装并更新了pandas和numpy，但是据我了解，这是python错误。我看过一些教程，没有人遇到这个问题。请参阅下面的代码，返回错误：

import pandas as pd
import numpy as np

data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
    "time": int64,
    "created_date_sk": int64,
    "eventType": object,
    "itemId": int64,
    "fieldId": int64,
    "userId": int64
})

data.head()

完整跟踪：

回溯（最近一次通话最后一次）：文件“ manipulate.py”，第5行，在模块“时间”：int64，NameError：名称“ int64”未定义

我希望可以识别int64类型，但似乎只能读取int类型。对象类型似乎可行。

Answer 1

解释器告诉您，由于int64属于numpy，因此无法识别。

将代码更改为此（它抱怨我的文件系统中没有file.csv，但这很正常）：

import pandas as pd
import numpy as np

data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
    "time": np.int64,
    "created_date_sk": np.int64,
    "eventType": object,
    "itemId": np.int64,
    "fieldId": np.int64,
    "userId": np.int64
})

data.head()

或者更好的是，在开始时将其导入：

from numpy import int64

Answer 2

只需使用int。 pandas dtype int64映射到python int。

import pandas as pd
import numpy as np

data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
    "time": int,
    "created_date_sk": int,
    "eventType": object,
    "itemId": int,
    "fieldId": int,
    "userId": int
})

data.head()

Answer 3

您将int64作为变量传递，您必须将其作为字符串传递，请使用下面给出的代码：-

import pandas as pd
import numpy as np

data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
"time": 'int64',
"created_date_sk": 'int64',
"eventType": 'object',
"itemId": 'int64',
"fieldId": 'int64',
"userId": 'int64'
})

data.head()

希望对您有帮助。

Answer 4

出现此错误的原因是因为int64未在本地python命名空间中定义。因此在字典中使用它会引发错误。您可以采取几种措施来解决此问题。

选项1：使用字符串

最简单的选择是将数据类型包含在字符串中。只需在dtype词典中将int64更改为"int64"。

选项2：使用numpy

将int64更改为np.int64。（请注意，这将要求您导入numpy程序包。

我喜欢option2。

“ NameError：未定义名称'int64'”

4 个答案: