CTF Reader为CNTK中的大文件抛出错误

时间:2016-12-27 05:44:54

标签: c++ nlp deep-learning cntk

我在Github的CNTK教程之后使用了CTF阅读器功能。

def create_reader(path, is_training, input_dim, label_dim):
    return MinibatchSource(CTFDeserializer(path, StreamDefs(
        features = StreamDef(field='x', shape=input_dim, is_sparse=True),
        labels = StreamDef(field='y', shape=label_dim, is_sparse=False)
    )), randomize=is_training, epoch_size= INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)

除非输入文件大小超过特定大小(未知),否则此操作完全正常。然后它会抛出这样的错误:

WARNING: Sparse index value (269) at offset 8923303 in the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt) exceeds the maximum expected value (268).
attempt: Reached the maximum number of allowed errors while reading the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt)., retrying 2-th time out of 5...
.
.
.

RuntimeError: Reached the maximum number of allowed errors while reading the input file (C:\local\CNTK-2-0-beta6-0-Windows-64bit-CPU-Only\cntk\Examples\common\data_pos_train_balanced_ctf.txt).

我发现TextParser.cpp文件中抛出了这种错误 https://github.com/Microsoft/CNTK/blob/5633e79febe1dc5147149af9190ad1944742328a/Source/Readers/CNTKTextFormatReader/TextParser.cpp

对此有什么解决方案或解决方法?

1 个答案:

答案 0 :(得分:2)

您需要知道输入的维度,并且知道索引从0开始。因此,如果您创建了一个输入文件,将您的词汇量映射到1到20000的范围,则维度为20001。