所以我在test.txt中有以下数据:
étoufee
placing
和以下代码:
import pandas as pd
import numpy as np
widths = [4,3]
names = ["part1", "part2"]
df = pd.read_fwf('test.txt',widths = widths, names = names, encoding = 'utf8')
print df
,输出为:
part1 part2
0 éto ufe
1 plac ing
注意第一行。特殊字符导致read_fwf正确读取长度,我们正在丢失数据。我尝试过设置encoding = utf-8但是没有用。还有其他选择吗?
对于那些可能在将来看这个的人,这里有更新的代码
# encoding=utf8
import pandas as pd
import numpy as np
from io import StringIO
import sys, locale
import codecs
with codecs.open('test.txt','r',encoding='utf8') as f:
text = f.read()
widths = [4,3]
names = ["part1", "part2"]
df = pd.read_fwf(StringIO(text),widths = widths, names = names, encoding = 'utf8')
print(df)
答案 0 :(得分:1)
不是回答
可能有用
txt = """étoufee
placing"""
import pandas as pd
import numpy as np
from io import StringIO
widths = [4,3]
names = ["part1", "part2"]
df = pd.read_fwf(StringIO(txt),widths = widths, names = names, encoding = 'utf8')
print(df)
part1 part2
0 étou fee
1 plac ing
import sys, locale
print(sys.version)
print(pd.__version__)
print(sys.getfilesystemencoding())
print(sys.getdefaultencoding())
print(locale.getlocale())
3.5.2 |Anaconda custom (x86_64)| (default, Jul 2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
0.19.0
utf-8
utf-8
('en_US', 'UTF-8')