我有一个ascii文件,其中包含3行数据,如下所示:
Timestamp: 00:47:14 SATID 13 VAL1 28 VAL2 227 SIGNAL 37 SATID 15 VAL1 22 VAL2 265 SIGNAL 30 SATID 16 VAL1 22 VAL2 265 SIGNAL 30
Timestamp: 00:48:14 SATID 13 VAL1 28 VAL2 227 SIGNAL 37 SATID 15 VAL1 22 VAL2 265 SIGNAL nan SATID 16 VAL1 22 VAL2 265 SIGNAL 30
Timestamp: 00:49:14 SATID 14 VAL1 22 VAL2 265 SIGNAL 30
(请参阅图像了解原始格式)。 original ascii data format 当我尝试将其读入Python时,我收到以下错误:
time,sat1,sat2,sat3,sat4 = np.loadtxt("test1.asc", usecols=(1,9,17,25,33), unpack=True, converters = {1: strpdate2num("%H:%M:%S")})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/lib/npyio.py", line 839, in loadtxt
vals = [vals[i] for i in usecols]
IndexError: list index out of range
有谁知道我怎么做才能让Python忽略空单元格并读取每列中可用的数据?
谢谢!
答案 0 :(得分:0)
没有抓住numpy
或pandas
,让我们看看我们将如何阅读这个&#34; 手动&#34;
首先要认识到时间戳始终位于同一位置,然后使用" SATID "
,因此您可以.split(' SATID ')[0]
获取该信息。
然后看来,如果您对信息的其余部分执行.split(' SATID ')
,则会获得所有必需的信息,然后您可以进一步拆分。
看起来像这样:
raw_data = ["Timestamp: 00:47:14 SATID 13 VAL1 28 VAL2 227 SIGNAL 37 SATID 15 VAL1 22 VAL2 265 SIGNAL 30 SATID 16 VAL1 22 VAL2 265 SIGNAL 30",
"Timestamp: 00:48:14 SATID 13 VAL1 28 VAL2 227 SIGNAL 37 SATID 15 VAL1 22 VAL2 265 SIGNAL nan SATID 16 VAL1 22 VAL2 265 SIGNAL 30",
"Timestamp: 00:49:14 SATID 14 VAL1 22 VAL2 265 SIGNAL 30"]
output = []
for line in raw_data:
if 'SATID' in line: #making sure it is not an empty line
timestamp = line.split(' SATID ')[0].split('Timestamp: ')[1].rstrip(' ')
data = line.split(' SATID ')[1:]
for record in data:
if 'VAL1' in record: #making sure it is not an empty record
satid = record.split(' VAL1 ')[0]
val1 = record.split(' VAL1 ')[1].split(' VAL2 ')[0]
val2 = record.split(' VAL2 ')[1].split(' SIGNAL ')[0]
signal = record.split(' SIGNAL ')[1].rstrip(' ')
output.append({'Timestamp':timestamp,
'SATID':satid,
'VAL1':val1,
'VAL2':val2,
'SIGNAL':signal})
# output is now a list of dictionaries
for d in output:
print(d)
答案 1 :(得分:0)
由于列边不相交,您可以将文件视为固定宽度文件并使用函数read_fwf
。您必须准备列规范列表 - 指定每列的第一个和最后一个位置的元组列表。这是规格的开头(很无聊,但你只需要做一次):
specs = [(0,11),(11,20),(20,26),(26,29),(29,33),(33,37),
(37,42),(42,45),(45,52),(52,55),(55,61),(61,63)]
pd.read_fwf('foo.txt',header=None,colspecs=specs)
# 0 1 2 3 4 5 6 7 8 9 \
#0 Timestamp: 00:47:14 SATID 13.0 VAL1 28.0 VAL2 227.0 SIGNAL 37.0
#1 Timestamp: 00:48:14 SATID 13.0 VAL1 28.0 VAL2 227.0 SIGNAL 37.0
#2 Timestamp: 00:49:14 NaN NaN NaN NaN NaN NaN NaN NaN
# 10 11
#0 NaN NaN
#1 NaN NaN
#2 SATID 1.0