Question

尝试使用以下格式将 csv 文件读入 pandas 数据框

dp = pd.read_csv('products.csv', header = 0,  dtype = {'name': str,'review': str,
                                                      'rating': int,'word_count': dict}, engine = 'c')
print dp.shape
for col in dp.columns:
    print 'column', col,':', type(col[0])
print type(dp['rating'][0])
dp.head(3)

这是输出：

(183531, 4)
column name : <type 'str'>
column review : <type 'str'>
column rating : <type 'str'>
column word_count : <type 'str'>
<type 'numpy.int64'>

我可以理解 pandas 可能会发现很难将字典的字符串表示转换为给定this和{{3的字典}}。但是＆＃34;评级＆＃34;的内容怎么样？列是str和numpy.int64 ???

顺便说一下，不指定引擎或标题的调整不会改变任何内容。

谢谢和问候

Answer 1

我认为你应该先检查一下：Pandas: change data type of columns

当google pandas dataframe column type时，它排在前5位。

Answer 2

只是做：

for col in dp.columns:
    print 'column', col,':', col[0]

你会看到你打印每个列名的第一个字母，这是一个字符串。请注意，您要在列的名称上进行迭代，而不是在每个系列上进行迭代。

您想要的是通过循环检查每列的类型，而不是：

for col in dp.columns:
    print 'column', col,':', type(dp[col][0])

...正如您对列评级所做的那样!!

Answer 3

使用：

dp.info()

查看列的数据类型。 dp.columns引用列标题名称，即字符串。

Answer 4

只需将read_table与定界符用作","，并将literal_eval与用于转换相关列中的值的函数一起使用。

recipes = pd.read_table("\\souravD\\PP_recipes.csv", sep=r',',
                      names=["id", "i", "name_tokens", "ingredient_tokens", "steps_tokens", "techniques","calorie_level","ingredient_ids"],
                      converters = {'name_tokens' : literal_eval,
                                    'ingredient_tokens' : literal_eval,
                                    'steps_tokens' : literal_eval,
                                    'techniques' : literal_eval,
                                    'ingredient_ids' : literal_eval},header=0)

image of recipes dataframe after changing datatype

使用pandas读取csv时设置列类型

4 个答案: