我想专门使用os模块来处理读/写二进制文件。在读取数据类型超过1个字节的值时遇到问题,例如int64
,float32
,...等。为了说明我的问题,让我们看一下我写的以下示例。我生成np.float64
类型的随机值,每个值8
个字节:
# Write
n = 10
dim = 2
fd = os.open('test.dat', os.O_CREAT | os.O_WRONLY)
data_w = np.random.uniform(low=0.5, high=13.3, size=(n,dim)).astype(np.float64)
print("Written Data are:\n%s\n" % data_w)
os.write(fd, data_w.tobytes())
os.close(fd)
print("------------------ \n")
# Read
start_read = 0 # 0 for now. Later I can read from any row!
total_num_to_read = n*dim
fd = os.open('test.dat', os.O_RDONLY)
os.lseek(fd, start_read, 0) # start_read from the beginning 0
raw_data = os.read(fd, total_num_to_read) # How many values to be read
data_r = np.fromiter(raw_data, dtype=np.float64).reshape(-1, dim)
print("Data Read are:\n%s\n" % data_r)
os.close(fd)
阅读不正确。看看它是如何返回的:
Written Data are:
[[ 2.75763292 9.87883101]
[ 1.73752327 9.9633879 ]
[ 1.01616811 1.81174597]
[ 9.93904659 10.6757686 ]
[ 7.02452029 2.68652109]
[ 5.29766028 11.15384409]
[ 4.12499766 10.37214532]
[11.75811252 3.30378401]
[ 1.72738203 2.11228277]
[ 7.7321937 11.64298051]]
------------------
Data Read are:
[[250. 87.]
[227. 216.]
[161. 15.]
[ 6. 64.]
[162. 178.]
[ 59. 35.]
[246. 193.]
[ 35. 64.]
[218. 97.]
[ 81. 50.]]
我无法正确检索!我认为np.fromiter(raw_data, dtype=np.float64).reshape(-1, dim)
应该照顾它,但我不知道问题出在哪里。在这种情况下,如果我知道它具有特定的数据类型(即np.float64
),我怎样才能读取二进制数据?
答案 0 :(得分:1)
您应该使用np.fromstring(raw_data)
代替fromiter()
。检查文档以了解每个功能的用途。此外,从文件中读取时,请阅读正确的个字节数 !!! :8* total_num_to_read
。
In [103]: # Write
...: n = 10
...: dim = 2
...: fd = os.open('test.dat', os.O_CREAT | os.O_WRONLY)
...: data_w = np.random.uniform(low=0.5, high=13.3, size=(n,dim)).astype(np.float64)
...: print("Written Data are:\n%s\n" % data_w)
...: os.write(fd, data_w.tobytes())
...: os.close(fd)
...: print("------------------ \n")
...:
...: # Read
...: start_read = 0 # 0 for now. Later I can read from any row!
...: total_num_to_read = n*dim
...: fd = os.open('test.dat', os.O_RDONLY)
...: os.lseek(fd, start_read, 0) # start_read from the beginning 0
...: raw_data = os.read(fd, 8*total_num_to_read) # How many values to be read
...: data_r = np.fromstring(raw_data, dtype=np.float64).reshape(-1, dim)
...: print("Data Read are:\n%s\n" % data_r)
...: os.close(fd)
...:
...:
Written Data are:
[[ 11.2465988 5.45304778]
[ 12.06466331 9.95717255]
[ 7.35402895 1.68972606]
[ 0.7259652 1.01265826]
[ 3.11340311 2.44725153]
[ 2.82109715 5.02768335]
[ 12.69054614 9.26028537]
[ 5.13785639 2.0780649 ]
[ 4.6796513 4.24710598]
[ 2.34859141 8.87224674]]
------------------
Data Read are:
[[ 11.2465988 5.45304778]
[ 12.06466331 9.95717255]
[ 7.35402895 1.68972606]
[ 0.7259652 1.01265826]
[ 3.11340311 2.44725153]
[ 2.82109715 5.02768335]
[ 12.69054614 9.26028537]
[ 5.13785639 2.0780649 ]
[ 4.6796513 4.24710598]
[ 2.34859141 8.87224674]]