我正在尝试从磁盘读取文件,然后将其拆分为[features and labels]
def generator(data_path):
x_text=[]
counter=0
_y=[]
for root, dirs, files in os.walk(data_path):
for _file in files:
if _file.endswith(".txt"):
_contents = list(open(data_path+_file, "r", encoding="UTF8",errors='ignore').readlines())
_contents = [s.strip() for s in _contents]
x_text=x_text+_contents
y_examples=[0,0,0]
y_examples[counter]=1
y_labels = [y_examples for s in _contents]
counter+=1
_y=_y+y_labels
return [x_text, _y]
我的磁盘上有3.5GB的巨大数据,我无法同时将其读入内存。如何修改此代码一次生成n个文件进行处理。
for X_batch, y_batch in generator(data_path):
feed_dict = {X: X_batch, y: y_batch}
有没有更有效的方式来读取张量流中的大量数据?