pandas read_fwf特殊字符未正确加载

时间:2017-01-11 01:29:18

标签: python pandas

所以我在test.txt中有以下数据:

étoufee
placing

和以下代码:

import pandas as pd
import numpy as np

widths = [4,3]
names = ["part1", "part2"]

df = pd.read_fwf('test.txt',widths = widths, names = names, encoding = 'utf8')
print df

,输出为:

  part1 part2
0   éto   ufe
1  plac   ing

注意第一行。特殊字符导致read_fwf正确读取长度,我们正在丢失数据。我尝试过设置encoding = utf-8但是没有用。还有其他选择吗?

对于那些可能在将来看这个的人,这里有更新的代码

# encoding=utf8

import pandas as pd
import numpy as np
from io import StringIO
import sys, locale
import codecs


with codecs.open('test.txt','r',encoding='utf8') as f:
    text = f.read()



widths = [4,3]
names = ["part1", "part2"]

df = pd.read_fwf(StringIO(text),widths = widths, names = names, encoding = 'utf8')
print(df)

1 个答案:

答案 0 :(得分:1)

不是回答
可能有用

txt = """étoufee
placing"""

import pandas as pd
import numpy as np
from io import StringIO

widths = [4,3]
names = ["part1", "part2"]

df = pd.read_fwf(StringIO(txt),widths = widths, names = names, encoding = 'utf8')
print(df)

  part1 part2
0  étou   fee
1  plac   ing
import sys, locale
print(sys.version)
print(pd.__version__)
print(sys.getfilesystemencoding())
print(sys.getdefaultencoding())
print(locale.getlocale())

3.5.2 |Anaconda custom (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
0.19.0
utf-8
utf-8
('en_US', 'UTF-8')