Question

我有一个.xls文件，看起来像这样

col_a       col_b   col_c   col_d
5376594                     hello
12028432                    world
17735732    hello   12      hello
17736843    world           world

当我用

读取文件时

test = pandas.read_excel('F:/test.xls')

使用以下列类型读取表：

>>> test.dtypes
col_a       int64
col_b       object
col_c       float64
col_d       object

我遇到的问题是，我想为col_b和col_d添加字符串列。因为我是python的新手，你能指点我吗

幕后发生了什么？和
是否有任何参数需要调整以将列读为字符串？

编辑：注释中询问的第一行的类型

>>> type(test.iloc[0]['col_a'])
<class 'numpy.int64'>
>>> type(test.iloc[0]['col_b'])
<class 'float'>
>>> type(test.iloc[0]['col_c'])
<class 'numpy.float64'>
>>> type(test.iloc[0]['col_d'])
<class 'str'>

Answer 1

您可以在pandas.read_csv中定义dtype。

dtype ：数据类型名称或列名称到数据类型的dict。如果未指定，将推断数据类型。（不支持引擎=＆＃39; python＆＃39;）

为什么NaN是float - here dtypes的类型是here（在页面末尾）。

测试：

import pandas
import io
import numpy

col_types = {"col_a": numpy.int32, "col_b": str, "col_c": str, "col_d": str}

temp=u"""col_a,col_b,col_c,col_d
5376594,,,hello
12028432,,,world
17735732,hello,12,hello
17736843,world,,world"""

test = pandas.read_csv(io.StringIO(temp), header=0, sep=",", dtype=col_types)



print type(test.iloc[0]['col_a'])
print type(test.iloc[0]['col_b'])
print type(test.iloc[0]['col_c'])
print type(test.iloc[0]['col_d'])
#
#<type 'numpy.int32'>
#<type 'float'>
#<type 'float'>
#<type 'str'>

print type(test.iloc[2]['col_a'])
print type(test.iloc[2]['col_b'])
print type(test.iloc[2]['col_c'])
print type(test.iloc[2]['col_d']).
#
#<type 'numpy.int32'>
#<type 'str'>
#<type 'str'>
#<type 'str'>

print test
print test.dtypes
#
#col_a     int32
#col_b    object
#col_c    object
#col_d    object
#dtype: object

pandas将excel“General”列作为对象读取

1 个答案: