当我尝试从具有5000行的数据集中采样数据时,我想了解语法的问题,我只想从中随机抽取500个数据。
数据集(xdata)的repex:
AccountId Street City State ZipCode CloseFactorPct OpenFactorPct ZipIncome ZipDegree
1 455697 3919 Birkdale Ln Se Olympia WA 98501 0.75 1.40 67060 0.17879866
2 490095 29174 Wagon Rd Agoura Hills CA 91301 0.85 2.50 115125 0.21376952
3 427399 301a Franklin Ave Princeton NJ 8540 0.80 2.25 124954 0.50428200
4 470678 1461 Woodsview Way Macedon NY 14502 0.80 2.50 67780 0.13772373
5 424824 616 Locust Ave Las Animas CO 81054 0.80 2.25 31343 0.02021198
6 437343 13 New Oxford Rd Conway AR 72034 0.80 2.25 51435 0.15904222
TotalOwed
1 0.0
2 185.1
3 1645.0
4 0.0
5 0.0
6 0.0
>
我的代码:
sample2 <- xdata[sample(nrow(xdata), "500", replace=T), sample(ncol(xdata), 10, replace=T)]
head(sample2)
ZipIncome City ZipIncome.1 TotalOwed Street OpenFactorPct ZipHhIncome.2
14470 41866 Columbus 41866 841.31 792 Dennison Avenue 0.85 41866
23502 55221 El Paso 55221 0.00 12949 Eastbrook Drive Apt 53 0.70 55221
7370 93373 Saddle Brook 93373 570.38 229 S Boulevard 0.70 93373
31627 61830 Choudrant 61830 1156.28 153 Jones Street 0.70 61830
29840 39697 Beckley 39697 0.00 2109 S Kanawha St 0.75 39697
14938 91313 Bradenton 91313 0.00 5007 Serata Dr 0.85 91313
ZipIncome.3 ClosedFactorPct ZipIncome.4
14470 41866 0.95 41866
23502 55221 0.80 55221
7370 93373 1.20 93373
31627 61830 0.80 61830
29840 39697 0.80 39697
14938 91313 1.30 91313
我收到的输出为我提供了zipincome的4个副本。为什么会这样?有人可以帮助我了解我提取随机样本的语法是否错误或者我是否需要set.seed()?