我正在从文本文件导入数据框
我想指定列的数据类型,但是熊猫似乎忽略了dtype
输入。
一个工作示例:
from io import StringIO
import pandas as pd
string = 'USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END\n007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926'
f = StringIO(string)
df = pd.read_fwf(f,
colspecs = [(0,6),
(7,12),
(13,41),
(43,45),
(48,50),
(51,55),
(57,64),
(65,73),
(74,81),
(82,90),
(91,101)],
dtypes = {'USAF' : str,
'WBAN' : str,
'STATION NAME' : str,
'CT' : str,
'ST' : str,
'CALL' : str,
'LAT' : float,
'LON' : float,
'ELEV(M)' : float,
'BEGIN' : int,
'END' : int,},
)
df.dtype
返回
USAF int64
WBAN int64
STATION NAME object
CT object
ST float64
CALL float64
LAT float64
LON float64
ELEV(M) float64
BEGIN int64
END int64
dtype: object
为什么会这样?如何强制第一列为字符串?
答案 0 :(得分:1)
使用read_fwf进行dtype转换存在问题。这是熊猫猜测的类型并应用。在此处明确使用converters
。您必须在DataFrame创建期间执行此操作,因为如果以后进行转换,您将失去前导0
。
string = 'USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END\n007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926'
f = StringIO(string)
df = pd.read_fwf(f,
colspecs = [(0,6),
(7,12),
(13,41),
(43,45),
(48,50),
(51,55),
(57,64),
(65,73),
(74,81),
(82,90),
(91,101)],
converters = {'USAF':lambda x : str(x),
'WBAN':lambda x : str(x),
'STATION NAME':lambda x : str(x),
'CT':lambda x : str(x),
'ST':lambda x : str(x),
'CALL':lambda x : str(x)}
)
>>> df.dtypes
USAF object
WBAN object
STATION NAME object
CT object
ST object
CALL object
LAT float64
LON float64
ELEV(M) float64
BEGIN int64
END int64
dtype: object