TypeError:尝试在FastAI中使用TextLMDataBunch.from_csv时在0维数组上进行迭代

时间:2018-12-01 18:17:15

标签: encoding ascii utf fast-ai

库需要utf-8。我尝试使用以下方式将us-ascii文件转换为utf-8

iconv -f us-ascii -t utf-8 src.csv > target.csv

当我这样做的时候:

file -I target.csv

它仍然显示字符集为us-ascii。然后我发现us-ascii是utf-8的子集,该文件只能猜测文件类型。

但是,如果我使用src.csv作为TextLMDataBunch.from_csv()库的输入,它就可以工作。如果我这样做:

cat src.csv > target.csv

然后将target.csv用作同一库的输入,它不起作用并出现以下错误:

   TypeError                                 Traceback (most recent call last)
<ipython-input-118-44bc7147d2a4> in <module>()
----> 1 data_lm = TextLMDataBunch.from_csv(sample_p, 'voila.csv')

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in from_csv(cls, path, csv_name, valid_pct, test, tokenizer, vocab, classes, header, text_cols, label_cols, label_delim, **kwargs)
    180         test_df = None if test is None else pd.read_csv(Path(path)/test, header=header)
    181         return cls.from_df(path, train_df, valid_df, test_df, tokenizer, vocab, classes, text_cols,
--> 182                            label_cols, label_delim, **kwargs)
    183 
    184     @classmethod

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in from_df(cls, path, train_df, valid_df, test_df, tokenizer, vocab, classes, text_cols, label_cols, label_delim, **kwargs)
    165         src = ItemLists(path, TextList.from_df(train_df, path, cols=text_cols, processor=processor),
    166                         TextList.from_df(valid_df, path, cols=text_cols, processor=processor))
--> 167         src = src.label_for_lm() if cls==TextLMDataBunch else src.label_from_df(cols=label_cols, classes=classes, sep=label_delim)
    168         if test_df is not None: src.add_test(TextList.from_df(test_df, path, cols=text_cols))
    169         return src.databunch(**kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _inner(*args, **kwargs)
    356         assert isinstance(fv, Callable)
    357         def _inner(*args, **kwargs):
--> 358             self.train = ft(*args, **kwargs)
    359             assert isinstance(self.train, LabelList)
    360             self.valid = fv(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in label_for_lm(self, **kwargs)
    285         "A special labelling method for language models."
    286         self.__class__ = LMTextList
--> 287         return self.label_const(0, label_cls=LMLabel)
    288 
    289     def reconstruct(self, t:Tensor):

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_const(self, const, **kwargs)
    211     def label_const(self, const:Any=0, **kwargs)->'LabelList':
    212         "Label every item with `const`."
--> 213         return self.label_from_func(func=lambda o: const, **kwargs)
    214 
    215     def label_empty(self):

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
    219     def label_from_func(self, func:Callable, **kwargs)->'LabelList':
    220         "Apply `func` to every input to get its label."
--> 221         return self.label_from_list([func(o) for o in self.items], **kwargs)
    222 
    223     def label_from_folder(self, **kwargs)->'LabelList':

TypeError: iteration over a 0-d array

有人可以告诉我怎么了吗?我正在Google Colab上尝试此操作,并在Colab和Mac上尝试了字符编码更改,但没有结果。

0 个答案:

没有答案