我在R中使用College数据帧。我有以下代码
input.df = College
# Making df.train similar to input.df but with zero rows.
df.train = input.df[0,]
temp.split = input.df[input.df[,1] == "Yes",]
sample.size = floor(0.75 * nrow(temp.split))
train.ind = sample(seq_len(nrow(temp.split)),size = sample.size)
temp.train = temp.split[train.ind, ]
df.train = merge(x = df.train, y = temp.train, all = TRUE)
合并后我松开了索引。
head(input.df)
Private Apps Accept Enroll Top10perc Top25perc
Abilene Christian University Yes 1660 1232 721 23 52
Adelphi University Yes 2186 1924 512 16 29
Adrian College Yes 1428 1097 336 22 50
Agnes Scott College Yes 417 349 137 60 89
Alaska Pacific University Yes 193 146 55 16 44
Albertson College Yes 587 479 158 38 62
head(temp.train, n = 4)
Private Apps Accept Enroll Top10perc Top25perc
University of South Florida No 7589 4676 1876 29 63
Virginia Tech No 15712 11719 4277 29 53
Valley City State University No 368 344 212 5 27
Winthrop University No 2320 1805 769 24 61
head(df.train)
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad
1 No 233 233 153 5 12 658 58
2 No 285 280 208 21 43 1140 473
3 No 368 344 212 5 27 863 189
4 No 434 412 319 10 30 1376 237
5 No 441 369 172 17 45 633 317
6 No 480 405 380 19 46 1673 1014
以上输出被截断以适合窗口
你可以看到"南佛罗里达大学","弗吉尼亚理工大学"合并后丢失等。有没有办法保留它们?
答案 0 :(得分:2)
很遗憾,您无法使用merge
(即保留row.names
)。
请注意,例如dplyr::left_join()
实际上会遇到同样的问题。
我担心你必须在row.names
期间暂时加入merge
,例如:
df.train = transform(merge(
x = df.train,
y = cbind(rownames = rownames(temp.train), temp.train),
all = TRUE
), row.names = rownames, rownames = NULL)