我有类似列表的列表:
list = [[[bad, good],"Antonyms"], [[good, nice],"Synonyms"]]
我需要将此数据分为训练,开发和测试:60%,20%,20% 而且我不知道该怎么做。类似的问题不能给我答案。也许情绪低落有个主意吗?
谢谢
答案 0 :(得分:1)
我假设Antonyms
,synonyms
是您的某种类别。使用sklearn的train_test_split,我们可以进行数据拆分。
注意:我已将坏,好等更改为字符串。希望您的数据集也是如此。
import numpy as np
from sklearn.model_selection import train_test_split
my_list = [[['bad', 'good'],"Antonyms"], [['good', 'nice'],"Synonyms"],
[['good', 'nice'],"Synonyms"],[['good', 'nice'],"Synonyms"],
[['good', 'nice'],"Synonyms"]]
data=np.array(my_list)
print(data.shape)
#(5, 2)
X,y=data[:,0],data[:,1]
#split the data to get 60% train and 40% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
#split the test again to get 20% dev and 20% test
X_dev, X_test, y_dev, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42)
print(y_train.shape,y_dev.shape,y_test.shape)
#(3,) (1,) (1,)
答案 1 :(得分:1)
培训,开发和测试将是最终生成的三个列表。
import random
l = [[['bad0', 'good0'], 'Antonyms0'], [['good0', 'nice0'], 'Synonyms0'],
[['bad1', 'good1'], 'Antonyms1'], [['good1', 'nice1'], 'Synonyms1'],
[['bad2', 'good2'], 'Antonyms2'], [['good2', 'nice2'], 'Synonyms2'],
[['bad3', 'good3'], 'Antonyms3'], [['good3', 'nice3'], 'Synonyms3'],
]
#Initializing the three lists.
train = []
development = []
test = []
r = random.uniform(0, 1) # Random number generator between 0 & 1.
for i in l:
if r <= 0.6:
train = train + i
elif r <= 0.8:
development = development + i
else:
test = test + i
train
[['good1', 'nice1'],
'Synonyms1',
['bad3', 'good3'],
'Antonyms3',
['good3', 'nice3'],
'Synonyms3']
development
[['bad0', 'good0'],
'Antonyms0',
['good0', 'nice0'],
'Synonyms0',
['bad1', 'good1'],
'Antonyms1',
['bad2', 'good2'],
'Antonyms2',
['good2', 'nice2'],
'Synonyms2']
test
[]