我想将数据转换为hdf5文件,只需按照某人的代码即可。但是,它会在运行numpy.float32 does not support item assignment
时报告row[j] = float(fields[j])
。 create_dataset
是否有问题?
import h5py
import pandas as pd
import numpy as np
import os
import math
import sys
def add_lines(dset, lines):
num_lines = len(lines)
if num_lines == 0:
return
numrows = dset.shape[0]
dset.resize(((numrows+num_lines),))
rows = dset[numrows:(numrows+num_lines)]
for i in range(num_lines):
line = lines[i]
row = rows[i]
fields = line.split(",")
for j in range(0,len(fields)):
row[j] = float(fields[j])
dset[numrows:(numrows+num_lines)] = rows
if '__main__' == __name__:
print 'Loading...'
day = sys.argv[1]
file = day+".xls"
batch_size = 100000
lines = []
f = h5py.File("out", 'a')
if "dset" not in f:
dset = f.create_dataset("dset", (0,), dtype="float", maxshape=(None,))
else:
dset = f['dset']
with open(file, "r") as g:
for line in g:
line = line.strip()
lines.append(line)
if len(lines) == batch_size:
add_lines(dset, lines)
lines = []
add_lines(dset, lines)
答案 0 :(得分:0)
尝试更改
for j in range(0,len(fields)):
row[j] = float(fields[j])
到
row[:] = [float(v) for v in fields]
除此之外,看起来dset
是一维数组,而不是2d。
在尝试从输入行逐行分配值之前,您可能需要更改dset
的初始化。