是否可以通过指定我想要的实际大小而不是使用比率将数据帧分为训练集和测试集?我看到大多数示例都使用randomSplit。.
463715个训练样本
51630个测试样品
在scikit-learn中,我能够做到这一点,例如:
{
"users":[
{
"customerId":"2kXE3upOg5hnOG",
"ccoId":"paalle",
"userGroups":[
"CX Cloud Super Admins",
"CX Cloud Admins",
"aAutoGroupMarked12"
],
"emailId":"paalle@test.com",
"fullName":"Pavan Alle",
"isSelected":true
},
{
"customerId":"2kXE3upOg5hnOG",
"ccoId":"rtejanak",
"userGroups":[
"aTestUserGroupname1234"
],
"emailId":"rtejanak@test.com",
"fullName":"Raja Ravi Teja Nakirikanti"
}
],
"pagination":{
"pageNumber":1,
"totalPages":2,
"rowPerPage":10,
"totalRows":11
}
}