Question

使用python 2.7.5和pandas 0.12.0，我正在尝试使用'pd.io.parsers.read_fwf（）'将固定宽度字体文本文件导入DataFrame。我导入的值都是数字，但重要的是保留前导零，所以我想将dtype指定为字符串而不是int。

根据documentation for this function，read_fwf支持dtype属性，但是当我尝试使用它时：

data= pd.io.parsers.read_fwf(file, colspecs = ([79,81], [87,90]), header = None, dtype = {0: np.str, 1: np.str})

我收到错误：

ValueError: dtype is not supported with python-fwf parser

我已经尝试了尽可能多的变体，因为我可以设想'dtype = something'，但它们都会返回相同的消息。

任何帮助将不胜感激！

Answer 1

不是指定dtypes，而是为要保留为str的列指定转换器，建立在@ TomAugspurger的示例上：

from io import StringIO
import pandas as pd
data = StringIO(u"""
121301234
121300123
121300012
""")

pd.read_fwf(data, colspecs=[(0,3),(4,8)], converters = {1: str})

导致

    \n Unnamed: 1
0  121       0123
1  121       0012
2  121       0001

转换器是从列名或索引到函数的映射，用于转换单元格中的值（例如，int将它们转换为整数，浮点数转换为浮点数等）

Answer 2

那里的文档可能不正确。我认为相同的基础文档字符串用于几个读者。至于解决方法，因为你提前知道宽度，我认为你可以在事后添加零。

使用此文件和宽度[4,5]

121301234
121300123
121300012

我们得到：

In [38]: df = pd.read_fwf('tst.fwf', widths=[4,5], header=None)

In [39]: df
Out[39]: 
      0     1
0  1213  1234
1  1213   123
2  1213    12

要填写缺失的零，这会有效吗？

In [45]: df[1] = df[1].astype('str')

In [53]: df[1] = df[1].apply(lambda x: ''.join(['0'] * (5 - len(x))) + x)

In [54]: df
Out[54]: 
      0      1
0  1213  01234
1  1213  00123
2  1213  00012

上面lambda中的5来自正确的宽度。您需要选择需要前导零的所有列并将功能（具有正确的宽度）应用于每个列。

Answer 3

这将在 pandas 0.20.2 版本之后正常工作。

from io import StringIO
import pandas as pd
import numpy as np
data = StringIO(u"""
121301234
121300123
121300012
""")
pd.read_fwf(data, colspecs=[(0,3),(4,8)], header = None, dtype = {0: np.str, 1: np.str})

输出：

     0     1
0  NaN   NaN
1  121  0123
2  121  0012
3  121  0001

python read_fwf错误：'python-fwf解析器不支持'dtype'

3 个答案: