Question

我从熊猫那里读取了这两个TSV文件：

train = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/train.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")
test = pd.read_csv('https://raw.githubusercontent.com/loretoparisi/bert_text_classifier/master/data/imdb_kaggle/test.tsv', sep="\t", delimiter="\n", quoting=csv.QUOTE_ALL, engine="python", quotechar='"', encoding="utf-8")

但是我得到的形状是(N,1)而不是(N,3)

(156060, 1) (66292, 1)
PhraseId\tSentenceId\tPhrase
0   156061\t8545\tAn intermittently pleasing but m...
1   156062\t8545\tAn intermittently pleasing but m...
2   156063\t8545\tAn
3   156064\t8545\tintermittently pleasing but most...
4   156065\t8545\tintermittently pleasing but most...

The原始文件就像

PhraseId    SentenceId  Phrase
156061  8545    An intermittently pleasing but mostly routine effort .
156062  8545    An intermittently pleasing but mostly routine effort
156063  8545    An
156064  8545    intermittently pleasing but mostly routine effort
156065  8545    intermittently pleasing but mostly routine
156066  8545    intermittently pleasing but
156067  8545    intermittently pleasing
156068  8545    intermittently
156069  8545    pleasing

假设我已经通过了分隔符sep='\t'，为什么read_csv失败了？

熊猫读TSV错误列

0 个答案: