我有一个看起来像这样的数据集:
Person Team
36471430 15326406
37242356 15326406
34945710 15326406
29141024 15326406
10323768 15326124
647293 15326124
32358093 15326124
2144524 15326124
35199422 6692854
32651004 6692854
32309524 6692854
22701991 6692854
32343507 8540767
8343828 8540767
22669737 8540767
1128141 6596680
34840462 6596680
513193 6596523
8748403 6596523
29284130 15326509
8554552 15326509
33051835 15326628
32339184 15326628
32979394 15326628
30357112 15326628
我希望这些数据看起来像这样:
Team Person 1 Person 2 Person 3 Person 4
15326406 36471430 37242356 34945710 29141024
15326124 10323768 647293 32358093 2144524
6692854 35199422 32651004 32309524 22701991
8540767 32343507 8343828 22669737 NA
6596680 1128141 34840462 NA NA
6596523 513193 8748403 NA NA
15326509 29284130 8554552 NA NA
15326628 33051835 32339184 32979394 30357112
我一直在R工作,但我无法理解。
仅供参考 - 4不是每组最大人数。每组有时多达30人......我只是不想在这里输入一个大的例子。此外,数据集中还有更多变量,但这些变量确实是您回答我的问题所需的唯一变量(我认为)。
答案 0 :(得分:2)
rbind.fill.matrix可以通过丢失名称来实现。我认为其他reshape2或plyr函数会更好:
> plyr::rbind.fill.matrix( tapply(dat$Person, dat$Team, matrix, nrow=1) )
1 2 3 4
[1,] 513193 8748403 NA NA
[2,] 1128141 34840462 NA NA
[3,] 35199422 32651004 32309524 22701991
[4,] 32343507 8343828 22669737 NA
[5,] 10323768 647293 32358093 2144524
[6,] 36471430 37242356 34945710 29141024
[7,] 29284130 8554552 NA NA
[8,] 33051835 32339184 32979394 30357112
我认为这在某些方面可能会更好:
library(reshape2)
dcast(dat, Team ~ ., list)
Using Team as value column: use value.var to override.
Team NA
1 6596523 6596523, 6596523
2 6596680 6596680, 6596680
3 6692854 6692854, 6692854, 6692854, 6692854
4 8540767 8540767, 8540767, 8540767
5 15326124 15326124, 15326124, 15326124, 15326124
6 15326406 15326406, 15326406, 15326406, 15326406
7 15326509 15326509, 15326509
8 15326628 15326628, 15326628, 15326628, 15326628
答案 1 :(得分:1)
您可以使用split-apply-combine在基础R中构建此数据框。首先,我将计算要创建的列数,然后我将实际构建数据框,最后我将创建列名。< / p>
num.person <- max(table(dat$Team))
teams <- do.call(rbind, lapply(split(dat, dat$Team), function(x) {
c(x$Team[1], x$Person, rep(NA, num.person-nrow(x)))
}))
colnames(teams) <- c("Team", paste("Person", seq(num.person)))
teams
# Team Person 1 Person 2 Person 3 Person 4
# 6596523 6596523 513193 8748403 NA NA
# 6596680 6596680 1128141 34840462 NA NA
# 6692854 6692854 35199422 32651004 32309524 22701991
# 8540767 8540767 32343507 8343828 22669737 NA
# 15326124 15326124 10323768 647293 32358093 2144524
# 15326406 15326406 36471430 37242356 34945710 29141024
# 15326509 15326509 29284130 8554552 NA NA
# 15326628 15326628 33051835 32339184 32979394 30357112
答案 2 :(得分:1)
很多好的答案。这是一个仅使用基数R的短程序。两个简单的步骤:
首先,在您的数据中添加“播放器”列:
dat <- transform(dat, Player = ave(Team, Team, FUN = seq_along))
head(dat)
# Person Team Player
# 1 36471430 15326406 1
# 2 37242356 15326406 2
# 3 34945710 15326406 3
# 4 29141024 15326406 4
# 5 10323768 15326124 1
# 6 647293 15326124 2
然后,从长格式转变为宽格式:
reshape(dat, idvar = "Team", timevar = "Player", direction = "wide")
# Team Person.1 Person.2 Person.3 Person.4
# 1 15326406 36471430 37242356 34945710 29141024
# 5 15326124 10323768 647293 32358093 2144524
# 9 6692854 35199422 32651004 32309524 22701991
# 13 8540767 32343507 8343828 22669737 NA
# 16 6596680 1128141 34840462 NA NA
# 18 6596523 513193 8748403 NA NA
# 20 15326509 29284130 8554552 NA NA
# 22 15326628 33051835 32339184 32979394 30357112
来自亚特兰大GA的当地人!干杯!
答案 3 :(得分:0)
这是另一种方法。 ana是你的数据
library(dplyr)
library(tidyr)
ana %>%
group_by(Team) %>%
mutate(count = row_number(Person)) %>%
do(spread(., count,Person))
Team 1 2 3 4
1 6596523 513193 8748403 NA NA
2 6596680 1128141 34840462 NA NA
3 6692854 22701991 32309524 32651004 35199422
4 8540767 8343828 22669737 32343507 NA
5 15326124 647293 2144524 10323768 32358093
6 15326406 29141024 34945710 36471430 37242356
7 15326509 8554552 29284130 NA NA
8 15326628 30357112 32339184 32979394 33051835