我想同时在两列上创建数据调节的子集。
与此类似: subsetting data using multiple variables in R
例如:
假设我有一个名为Gamedat
的数据集:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris John 15
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
ageofempires John 15
goldeneye Dexter 2
说我想看看像goldeneye这样的游戏。而且我想看看任何玩家玩黄金眼的时间与其他游戏相同的频率(这在我的真实数据集中更有用)。
所以我这样做:
Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`
然后我这样做:
subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] &
Gamedat[ ,3] %in% Gameofinterest[ ,3])
但是这给了我:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
tetris Clint 5
tetris Dexter 8
goldeneye Thatcher 12
pacman Thatcher 15
goldeneye Clint 2
pacman Michael 5
pacman Michael 8
pacman Clint 12
tetris Clint 2
ageofempires Clint 5
pacman Dexter 8
ageofempires Thatcher 12
goldeneye Dexter 2
当我真正想要的是这个时候:
Games People Hoursplayed
goldeneye Michael 5
goldeneye Thatcher 8
goldeneye Dexter 12
goldeneye Dexter 15
pacman Dexter 2
goldeneye Thatcher 12
goldeneye Clint 2
pacman Michael 5
tetris Clint 2
ageofempires Thatcher 12
goldeneye Dexter 2
简而言之,我想找到符合“People&amp; Hoursplayed”的例子,
而不是“人”&amp; “小时播放”......有意义吗?
我知道我可以这样做:
Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")
Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]
并获得:
Games People Hoursplayed PHpaste
goldeneye Michael 5 Michael5
goldeneye Thatcher 8 Thatcher8
goldeneye Dexter 12 Dexter12
goldeneye Dexter 15 Dexter15
pacman Dexter 2 Dexter2
goldeneye Thatcher 12 Thatcher12
goldeneye Clint 2 Clint2
pacman Michael 5 Michael5
tetris Clint 2 Clint2
ageofempires Thatcher 12 Thatcher12
goldeneye Dexter 2 Dexter2
希望有更优雅的东西吗?
答案 0 :(得分:0)
我认为可以使用dplyr
来实现。首先,使用过滤器检索游戏为goldeneye的行。然后使用inner_join
使用People和HoursPlayed加入原始数据。可选:选择所需的列并按人员排列。
library(dplyr)
Gamedat %>%
filter(Games == "goldeneye") %>%
inner_join(Gamedat, by = c("People", "Hoursplayed")) %>%
select(Games = Games.y, People, Hoursplayed) %>%
arrange(People)
结果:
Games People Hoursplayed
1 goldeneye Clint 2
2 tetris Clint 2
3 goldeneye Dexter 12
4 goldeneye Dexter 15
5 pacman Dexter 2
6 goldeneye Dexter 2
7 goldeneye Michael 5
8 pacman Michael 5
9 goldeneye Thatcher 8
10 goldeneye Thatcher 12
11 ageofempires Thatcher 12