当我尝试用
读取下面的数据时loadtxt('RSTN')
我收到错误,然后尝试使用
填写缺少的数据genfromtxt('RSTN',delimiter=' ')
但是我收到了像Line #31112 (got 7 columns instead of 8)
其实我想用nan
填写这个缺失的数据
或类似的东西
我在一个名为RSTN
的ascii文件中获得了这样的数据 20120127165126 19 42 54 91 113 147 188 284
20120127165127 19 42 54 91 113 147 188 284
20120127165128 19 42 54 90 113 147 188 284
20120127165129 19 42 54 90 113 147 188 284
20120127165130 19 42 54 88 107 131 155 235
20120127165131 19 42 54 72 79 79 92 154
20120127165132 19 42 54 45 43 42 50 97
20120127165133 19 42 54 24 21 21 25 65
20120127165134 19 42 54 11 8 9 12 46
20120127165135 19 42 54 5 2 3 7 35
20120127165136 18 42 54 2 0 1 4 29
20120127165137 19 42 54 0 0 2 25
20120127165138 19 42 53 0 0 1 22
20120127165139 19 42 54 0 0 1 19
20120127165140 19 42 54 0 0 0 17
20120127165141 19 42 54 0 0 0 14
20120127165142 19 42 54 0 0 0 14
20120127165143 19 42 54 0 0 0 14
20120127165144 19 42 54 0 0 13
20120127165145 19 42 54 0 0 14
20120127165146 19 42 54 0 0 0 14
20120127165147 19 42 54 0 0 1 15
20120127165148 19 42 54 0 0 1 15
20120127165149 19 42 54 0 0 1 15
20120127165150 20 42 53 0 1 15
20120127165151 20 42 53 0 1 17
20120127165152 20 42 53 0 1 17
20120127165153 19 42 53 0 0 1 17
20120127165154 20 42 53 0 1 17
20120127165155 20 42 53 0 1 17
20120127165156 20 42 53 0 0 1 17
20120127165157 19 42 54 0 0 1 17
20120127165158 19 42 55 0 0 1 17
20120127165159 19 42 55 0 0 1 17
20120127165200 20 42 56 0 0 1 17
20120127165201 21 42 56 0 0 1 17
我做了这个
from pandas import *
data=read_fwf('26JAN12.K7O', colspecs='infer', header=None)
我得到了
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 429, in read_fwf
return _read(filepath_or_buffer, kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 198, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
self._make_engine(self.engine)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 592, in _make_engine
self._engine = klass(self.f, **self.options)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1954, in __init__
PythonParser.__init__(self, f, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1237, in __init__
self._make_reader(f)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1957, in _make_reader
self.data = FixedWidthReader(f, self.colspecs, self.delimiter)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1933, in __init__
raise AssertionError()
AssertionError
答案 0 :(得分:1)
如果您有pandas,则可以使用pd.read_fwf
解析它:
import pandas as pd
df = pd.read_fwf('data', colspecs='infer', header=None, parse_dates=[[0]])
print(df)
产量
0 1 2 3 4 5 6 7 8
0 2012-01-27 16:51:26 19 42 54 91 113 147 188 284
1 2012-01-27 16:51:27 19 42 54 91 113 147 188 284
...
11 2012-01-27 16:51:37 19 42 54 0 NaN 0 2 25
12 2012-01-27 16:51:38 19 42 53 0 NaN 0 1 22
13 2012-01-27 16:51:39 19 42 54 0 NaN 0 1 19
[36 rows x 9 columns]
或者,感谢DSM,使用np.genfromtxt
,您可以通过将宽度列表传递给delimiter
参数来解析固定宽度数据:
import numpy as np
np.set_printoptions(formatter={'float':'{:g}'.format})
arr = np.genfromtxt('data', delimiter=[18]+[7]*8)
print(arr)
产量
[[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 90 113 147 188 284]
...
[2.01201e+13 19 42 54 0 nan 0 2 25]
[2.01201e+13 19 42 53 0 nan 0 1 22]
[2.01201e+13 19 42 54 0 nan 0 1 19]
...]
答案 1 :(得分:0)
我遇到了类似的问题,从缺少数据的制表符分隔文件中读取数据。如果您可以以制表符分隔格式获取数据,则可以使用以下方法:
import pandas as pd
df = pd.read_csv('RSTN', sep='\t', header = None)