https://www.dropbox.com/s/35w66sri5rauv5d/FlightDelays.csv?dl=0
我正在从上面的链接中读取一个包含2201行的数据集。使用分割函数,我给出的比率为0.6。然后我应该得到两个数据集分别包含1320和881个数据集。最初它工作正常,但现在当我分裂时,而不是0.6,分裂发生在0.53,即使我在分裂中指定0.6作为我的比率。这可能是这个突然变化的问题。如何解决这个问题。下面给出了代码。
library(caTools)
originaldata.df<-read.csv("use csv from the link given above")
split<-sample.split(originaldata.df,SplitRatio = 0.6)
Trainingdataset<-subset(originaldata.df,split == "TRUE")
Testingdataset<-subset(originaldata.df,split == "FALSE")
ExpectedOutput:
1320(2201*60/100)
881(2201*40/100)
Actualoutput:
1186
1015
答案 0 :(得分:0)
您可以使用索引并按分割比率分配它们;
indexes = sample(1:nrow(originaldata.df),
size=0.6*nrow(originaldata.df))
Trainingdataset <- originaldata.df[indexes,]
Testingdataset <- originaldata.df[-indexes,]
这将是输出:
> dim(Testingdataset)
# [1] 881 13
> dim(Trainingdataset)
# [1] 1320 13
caTools
包:library(caTools)
#It should be applied on one of column of the data.frame otherwise samples over rows;
split<-sample.split(originaldata.df$schedtime,SplitRatio = 0.6)
Trainingdataset<-subset(originaldata.df,split == "TRUE")
Testingdataset<-subset(originaldata.df,split == "FALSE")
子集的大小(不完全符合您的预期;)
> dim(Trainingdataset)
# [1] 1323 13
> dim(Testingdataset)
# [1] 878 13
答案 1 :(得分:0)
这是一个自定义的分割函数,它将根据给定的比例导出两个rownumber子集:
<div class="tab">
<input id="tab-one" type="checkbox" name="tabs">
<label for="tab-one">Label One</label>
<div class="tab-content">
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque.em ipsum dolor sit
amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaqueem ipsum dolor sit amet, consectetur adipisicing
elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur,
architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque.em ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo
perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaqueem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum,
maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque</p>
</div>
</div>
<div class="tab">
<input id="tab-two" type="checkbox" name="tabs">
<label for="tab-two">Label Two</label>
<div class="tab-content">
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque.</p>
</div>
</div>
<div class="tab">
<input id="tab-three" type="checkbox" name="tabs">
<label for="tab-three">Label Three</label>
<div class="tab-content">
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque.</p>
</div>
</div>
<div class="half">
<p>Open <strong>one</strong></p>
<div class="tab blue">
<input id="tab-four" type="radio" name="tabs2">
<label for="tab-four">Label One</label>
<div class="tab-content">
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Tenetur, architecto, explicabo perferendis nostrum, maxime impedit atque odit sunt pariatur illo obcaecati soluta molestias iure facere dolorum adipisci eum? Saepe, itaque.</p>
</div>
</div>
使用函数派生训练和测试集:
splitFactor <- function(rows, prop){
a <- sample(seq(rows), ceiling(rows*prop))
b <- sample(seq(rows), floor(rows*(1-prop)))
list(a[order(a)],b[order(b)])
}
sp.53 <- splitFactor(nrow(iris), .53)
lapply(sp.53, length)
# [[1]]
# [1] 80
# [[2]]
# [1] 70