pandas.read_fwf忽略提供的dtypes

时间:2018-11-01 16:51:31

标签: python pandas

我正在从文本文件导入数据框 我想指定列的数据类型,但是熊猫似乎忽略了dtype输入。

一个工作示例:

from io import StringIO
import pandas as pd

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)

df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                 dtypes = {'USAF'         : str,
                           'WBAN'         : str,
                           'STATION NAME' : str,
                           'CT'           : str,
                           'ST'           : str,
                           'CALL'         : str,
                           'LAT'          : float,
                           'LON'          : float,
                           'ELEV(M)'      : float,
                           'BEGIN'        : int,
                           'END'          : int,},
                 )
df.dtype

返回

USAF              int64
WBAN              int64
STATION NAME     object
CT               object
ST              float64
CALL            float64
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object

为什么会这样?如何强制第一列为字符串?

1 个答案:

答案 0 :(得分:1)

使用read_fwf进行dtype转换存在问题。这是熊猫猜测的类型并应用。在此处明确使用converters。您必须在DataFrame创建期间执行此操作,因为如果以后进行转换,您将失去前导0

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)
df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                converters = {'USAF':lambda x : str(x),
                              'WBAN':lambda x : str(x),
                              'STATION NAME':lambda x : str(x),
                              'CT':lambda x : str(x),
                              'ST':lambda x : str(x),
                              'CALL':lambda x : str(x)}
                 )
>>> df.dtypes
USAF             object
WBAN             object
STATION NAME     object
CT               object
ST               object
CALL             object
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object