我从熊猫那里读取了这两个TSV文件:
train = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/train.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
test = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/test.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
但是我得到的形状是(N,1)
而不是(N,3)
(156060, 1) (66292, 1)
PhraseId\tSentenceId\tPhrase
0 156061\t8545\tAn intermittently pleasing but m...
1 156062\t8545\tAn intermittently pleasing but m...
2 156063\t8545\tAn
3 156064\t8545\tintermittently pleasing but most...
4 156065\t8545\tintermittently pleasing but most...
The原始文件就像
PhraseId SentenceId Phrase
156061 8545 An intermittently pleasing but mostly routine effort .
156062 8545 An intermittently pleasing but mostly routine effort
156063 8545 An
156064 8545 intermittently pleasing but mostly routine effort
156065 8545 intermittently pleasing but mostly routine
156066 8545 intermittently pleasing but
156067 8545 intermittently pleasing
156068 8545 intermittently
156069 8545 pleasing
假设我已经通过了分隔符sep='\t'
,为什么read_csv
失败了?