我有一大堆数据。我想将此列表拆分为培训和测试列表。我可以通过应用
拆分它cutoff = int(.7 * len(data_list)) # 70% of the data is used for training
training_list = data_list[:cutoff]
test_list = data_list[cutoff:]
但我不认为这是评估我的标记器的好策略。如何将我的列表分成这个百分比,但在不同的地方,并获得可靠的评估分数?谢谢!
答案 0 :(得分:0)
train_test_split()
中有一个名为sklearn.model_selection
的功能
文档:Train Test Split
您也可以在同一链接中找到示例。
>>> from sklearn.model_selection import train_test_split
>>> aaa = list(range(20))
>>> aaa
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> train_test_split(aaa, test_size=0.3)
[[7, 14, 2, 1, 5, 13, 3, 8, 9, 17, 15, 0, 10, 16], [11, 6, 4, 19, 12, 18]]