编辑:
我在这里找到了部分答案:
https://stackoverflow.com/a/26551913/2230844
https://stackoverflow.com/a/15026839/2230844
如何读取pandas这样的ASCII格式表:
----------------------------------------------------
| col1 col2 col3 col4 |
------------ ------------ ------------ -------------
1002 0.402397E-01 0.883513E-02 0.450885E-01 0.118748E-02
1003 0.105235 0.474509E-02 0.118508 0.168397E-03
1004 0.102625 0.225842E-02 0.317864E-02 0.997383
1 0.603750 0.475112E-01 0.679590 0.114713E-02
2 0.534171E-01 0.119815E-01 0.600187E-01 0.830949E-04
3 0.283291E-01 0.119353E-01 0.317530E-01 0.243996E-04
104 0.739759E-02 0.463873E-02 0.827061E-02 0.145207E-05
-----------------------------------------------------
我注意到使用read_fwf()
的答案,但需要手动指定列的宽度:
答案 0 :(得分:4)
假设您的ascii数据是字符串x
:
In [1099]: x
Out[1099]: ' ----------------------------------------------------\n | col1 col2 col3 col4 |\n ------------ ------------ ------------ -------------\n 1002 0.402397E-01 0.883513E-02 0.450885E-01 0.118748E-02\n 1003 0.105235 0.474509E-02 0.118508 0.168397E-03\n 1004 0.102625 0.225842E-02 0.317864E-02 0.997383 \n 1 0.603750 0.475112E-01 0.679590 0.114713E-02\n 2 0.534171E-01 0.119815E-01 0.600187E-01 0.830949E-04\n 3 0.283291E-01 0.119353E-01 0.317530E-01 0.243996E-04\n 104 0.739759E-02 0.463873E-02 0.827061E-02 0.145207E-05\n -----------------------------------------------------'
pd.read_csv中提供的一些选项可以帮助您进入此数据框:
In [1123]: pd.read_csv(StringIO(x), sep=' ', skipfooter=1, skiprows=1, skipinitialspace=True).drop([0])
Out[1123]:
| col1 col2 col3 col4 |.1
1 1002 0.402397E-01 0.883513E-02 0.450885E-01 0.001187 NaN
2 1003 0.105235 0.474509E-02 0.118508 0.000168 NaN
3 1004 0.102625 0.225842E-02 0.317864E-02 0.997383 NaN
4 1 0.603750 0.475112E-01 0.679590 0.001147 NaN
5 2 0.534171E-01 0.119815E-01 0.600187E-01 0.000083 NaN
6 3 0.283291E-01 0.119353E-01 0.317530E-01 0.000024 NaN
7 104 0.739759E-02 0.463873E-02 0.827061E-02 0.000001 NaN