如何从文本文件中提取数据到python中的二维数组

时间:2019-07-16 22:47:56

标签: python-3.x parsing data-structures

我是Python编程的新手,但遇到了一些麻烦。我有一个这样组织的文本文件(.dat)

 {
  "token1": [Array of numbers], // metadata, that has to be ignored    
  "token2": 5000,
  "token3": 16.8,
  "token4": -7118,
  "token5": "2017-11-12 15:38:50",
  "token6": false,
  "token7": ["LowHor", "LowVer", "HighHor", "HighVer"],
  "token8": "RadarID-3",
  ...
}, ... 50 examples   

//

import re

openText = open('bird_2017-11-12_15-38-42.dat')
text = openText.read()
openText.close()
keywords = ['Ceil_H_m', 'Ceil_Vx_mps', 'Ceil_Vy_mps', 'Ceil_Vz_mps', 
'Ceil_X_m', 'Ceil_Y_m', 'DateTimeCeil', 'DateTimeFile', 'IsCeilInMeteo', 
'IsCeilInNoises', 'Lambda_m', 'NamesChannels', 'NumChannels', 
'NumRangesPack', 'NumRaysPack', 'POI_Az_deg', 'POI_Height_m', 
'POI_Range_m', 'RadarID']
samples = text.count('TrackNumber') // metadata, that every example has
data = []

//

I need a 2dimensional array output like this
number of example    0                  1  ............ 50
----------------------------------------------------------
properties
token2             5000            
token3             16.8
token4            -7118
token5          2017-11-12 15:38:50
token6             false
token7         ["LowHor", "LowVer", "HighHor", "HighVer"]
token8            RadarID-3

关键字实际上是上述令牌。我曾尝试使用这些关键字来提取令牌的属性,但没有成功(re.match())

2 个答案:

答案 0 :(得分:0)

看起来您的输入文件可能几乎是JSON。具体来说,如果将输入文件的文本嵌入方括号中,则其语法可能为JSONArray。如果是这样,这将为您提供大部分所需的东西:

import json, collections

file_text = open('bird_2017-11-12_15-38-42.dat').read()
json_text = '[' + file_text + ']'
examples = json.loads(json_text)

transpose = collections.defaultdict(list)
for example in examples:
    for (keyword, value) in example.items():
        if keyword == 'token1':
            # metadata that has to be ignored
            continue
        transpose[keyword].append(value)

for (keyword, values) in transpose.items():
    print(keyword, values)

这假定每个示例都具有完全相同的关键字集。如果不是这种情况,则需要修改代码。

答案 1 :(得分:0)

好像您的数据是JSON格式,您只需添加[]即可将其添加到列表中。

内容为 file.txt

{
"token1": [1, 2, 3],
"token2": 5000,
"token3": 16.8,
"token4": -7118
},
{
"token1": [1, 2, 3],
"token2": 5001,
"token3": 16.9,
"token4": -6118
},
{
"token1": [1, 2, 3],
"token2": 5002,
"token3": 17.8,
"token4": -5118
},
{
"token1": [1, 2, 3],
"token2": 5003,
"token3": 15.8,
"token4": -3118
}

脚本可能看起来像这样:

import json

with open('file.txt', 'r') as f_in:
    data = f_in.read()

data = json.loads('[' + data + ']')

keys = [*sorted(data[-1].keys())][1:]
columns = [[v for k, v in sorted(d.items())][1:] for d in data]  # [1:] because we don't want the first "token1"

print('{: ^20}'.format('no of example') + ''.join('{: ^20}'.format(i) for i in range(len(columns))))
print('-' * (20 * (len(columns) + 1)))
for v in zip(keys, *columns):
    print(''.join('{: ^20}'.format(i) for i in v))

打印:

   no of example             0                   1                   2                   3          
----------------------------------------------------------------------------------------------------
       token2               5000                5001                5002                5003        
       token3               16.8                16.9                17.8                15.8        
       token4              -7118               -6118               -5118               -3118