Python,Sklearn:如何反转Sklearn的train_test_split?

时间:2018-04-11 13:50:06

标签: python scikit-learn

如果我有一个数据集X及其标签Y,那么我将其分为训练集和测试测试,scle为0.2,随机种子随机播放:11

>>>X.shape
(10000, 50,50)

train_data, test_data, train_label, test_label = train_test_split(X, Y, test_size=0.2, random_state=11, shuffle=True)

我怎么知道分割数据中样本的原始索引是什么,这意味着反转随机混洗?

例如,X[?]的相应train_data[123]是什么?

1 个答案:

答案 0 :(得分:1)

根据数据类型的不同,您可以轻松获取数据。如果它们是列车数据中的唯一且不重复的行,则可以对X中的每个元素进行字符串化,然后使用迭代器的索引函数来标识位置。

例如。

X =  ['i like wanda', 'i dont like anything', 'does this matter', 'this is choice test', 'how are you useful',  'are you mattering', 'this is a random test', 'this is my test', 'i dont like math', 'how can anything matter', 'who does matter', 'i like water', 'this is someone test', 'how does it matter', 'what is horrible',  'i dont like you', 'this is a valid test', 'this is a sample test', 'i like everything', 'i like ice cream', 'how can anything be useful', 'how is this useful', 'this is horrible', 'i dont like jokes']


Y = ['0', '0', '1', '0', '1', '1', '0', '0', '0', '1', '1', '0', '0', '1', '1', '0', '0', '0', '0', '0', '1', '1', '0', '0']
train_data, test_data, train_label, test_label = train_test_split(X, Y, test_size=0.2, random_state=11, shuffle=True)
for each in train_data:
     print X.index(each)

上面将给出X中的原始索引。但在这种情况下这是可能的,因为X具有不同的元素并且是string类型。对于更复杂的数据类型,您可能需要处理更多的处理。