如何将文本文件中的十六进制转换为数组(Python)?

时间:2017-02-15 11:06:12

标签: python numpy

我有一个文本文件,每行包含一个十六进制明文,我的文件如下所示:

7a8e5dc390781eab8df2c090bf4bebca
dbac0fba55d3d4fc177161bfe24dc7fb
82e5a7a021197f6fbe94a867d4bb3895
850580c1ffec887c5000c9b3a0e6b39d
1526af37ce4b0b4e81f8af0647e37119
bab19c53fd86e6afc933276b286e0c36
b52e53007bebe8877ce569ad2494dd76
44fd87e9b1a40a929ab6135665c22d0f
a88e5141ddc99e0207a9e144f4010a22
58ff597819ea0aa37024a1f1f84c5224

我需要将每一行转换为数组,我的意思是我的numpy文件必须如下所示:

[ 
[7a,8e,5d,c3,90,78,1e,ab,8d,f2,c0,90,bf,4b,eb,ca],[db,ac,0f,ba,55,d3,d4,fc,17,71,61,bf,e2,4d,c7,fb],

.......,     ]

import numpy as np
In_path= "/home/msmache/Bureau/testPlaintext.txt"
Out_path= "/home/msmache/Bureau/testPlaintext.npy"

with open(In_path, "r") as In_f:
    all_arrays = []
    Plaintext=[]
    for line in In_f:
        Plaintext=['{:02x}'.format(b) for b in line]
        all_arrays.append(Plaintext)
    print all_arrays   
    with open(Out_path, "wb") as Out_f:
        np.save(Out_path, all_arrays)
data = np.load(Out_path)
print data

这是我的错误:

 Plaintext=['{:02x}'.format(b) for b in line]
 ValueError: Unknown format code 'x' for object of type 'str'

1 个答案:

答案 0 :(得分:3)

Plaintext=['{:02x}'.format(b) for b in line]无法正常工作,因为for b in line会发出该行的字符,而您每次迭代都需要2个。

此外,即使您这样做,十六进制格式也适用于整数值,而不是字符串。

我建议创建all_arrays:使用简单列表切片双列表理解:

al_arrays = [[l[i:i+2] for i in range(0,len(l.strip()),2)] for l in In_f]

all_arrays按预期收益:

[['7a', '8e', '5d', 'c3', '90', '78', '1e', 'ab', '8d', 'f2', 'c0', '90', 'bf',
  '4b', 'eb', 'ca'],
 ['db', 'ac', '0f', 'ba', '55', 'd3', 'd4', 'fc', '17', '71', '61', 'bf', 'e2',
  '4d', 'c7', 'fb'],
 ['82', 'e5', 'a7', 'a0', '21', '19', '7f', '6f', 'be', '94', 'a8', '67', 'd4',
  'bb', '38', '95'],
 ['85', '05', '80', 'c1', 'ff', 'ec', '88', '7c', '50', '00', 'c9', 'b3', 'a0',
  'e6', 'b3', '9d'],
 ['15', '26', 'af', '37', 'ce', '4b', '0b', '4e', '81', 'f8', 'af', '06', '47',
  'e3', '71', '19'],
 ['ba', 'b1', '9c', '53', 'fd', '86', 'e6', 'af', 'c9', '33', '27', '6b', '28',
  '6e', '0c', '36'],
 ['b5', '2e', '53', '00', '7b', 'eb', 'e8', '87', '7c', 'e5', '69', 'ad', '24',
  '94', 'dd', '76'],
 ['44', 'fd', '87', 'e9', 'b1', 'a4', '0a', '92', '9a', 'b6', '13', '56', '65',
  'c2', '2d', '0f'],
 ['a8', '8e', '51', '41', 'dd', 'c9', '9e', '02', '07', 'a9', 'e1', '44', 'f4',
  '01', '0a', '22'],
 ['58', 'ff', '59', '78', '19', 'ea', '0a', 'a3', '70', '24', 'a1', 'f1', 'f8',
  '4c', '52', '24']]

注意:如果您想要数值而不是十六进制字符串,只需转换为整数(基数为16)

al_arrays = [[int(l[i:i+2],16) for i in range(0,len(l.strip()),2)] for l in In_f]

你会得到类似的东西:

[[122, 142, 93, 195, 144, 120, 30, 171, 141, 242, 192, 144, 191, 75, 235, 202], [219, 172, 15, 186, 85, ...

不是十六进制,但对于二进制转换并不重要。