通过两列的结合进行子集化

时间:2017-02-28 00:09:33

标签: r indexing subset multiple-columns

我想同时在两列上创建数据调节的子集。

与此类似: subsetting data using multiple variables in R

例如:

假设我有一个名为Gamedat的数据集:

        Games    People Hoursplayed
    goldeneye   Michael           5
    goldeneye  Thatcher           8
    goldeneye    Dexter          12
    goldeneye    Dexter          15
       pacman    Dexter           2
       tetris     Clint           5
       tetris    Dexter           8
    goldeneye  Thatcher          12
       pacman  Thatcher          15
    goldeneye     Clint           2
       pacman   Michael           5
       pacman   Michael           8
       pacman     Clint          12
       tetris      John          15
       tetris     Clint           2
 ageofempires     Clint           5
       pacman    Dexter           8
 ageofempires  Thatcher          12
 ageofempires      John          15
    goldeneye    Dexter           2

说我想看看像goldeneye这样的游戏。而且我想看看任何玩家玩黄金眼的时间与其他游戏相同的频率(这在我的真实数据集中更有用)。

所以我这样做:

 Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]`

然后我这样做:

  subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] & 
  Gamedat[ ,3] %in% Gameofinterest[ ,3])

但是这给了我:

       Games   People Hoursplayed
   goldeneye  Michael           5
   goldeneye Thatcher           8
   goldeneye   Dexter          12
   goldeneye   Dexter          15
      pacman   Dexter           2
      tetris    Clint           5
      tetris   Dexter           8
   goldeneye Thatcher          12
      pacman Thatcher          15
   goldeneye    Clint           2
      pacman  Michael           5
      pacman  Michael           8
      pacman    Clint          12
      tetris    Clint           2
ageofempires    Clint           5
      pacman   Dexter           8
ageofempires Thatcher          12
   goldeneye   Dexter           2

当我真正想要的是这个时候:

         Games   People Hoursplayed
     goldeneye  Michael           5
     goldeneye Thatcher           8
     goldeneye   Dexter          12
     goldeneye   Dexter          15
        pacman   Dexter           2
     goldeneye Thatcher          12
     goldeneye    Clint           2
        pacman  Michael           5
        tetris    Clint           2
  ageofempires Thatcher          12
     goldeneye   Dexter           2

简而言之,我想找到符合“People&amp; Hoursplayed”的例子,

而不是“人”&amp; “小时播放”......有意义吗?

我知道我可以这样做:

 Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="")

 Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ]

并获得:

        Games   People Hoursplayed    PHpaste
    goldeneye  Michael           5   Michael5
    goldeneye Thatcher           8  Thatcher8
    goldeneye   Dexter          12   Dexter12
    goldeneye   Dexter          15   Dexter15
       pacman   Dexter           2    Dexter2
    goldeneye Thatcher          12 Thatcher12
    goldeneye    Clint           2     Clint2
       pacman  Michael           5   Michael5
       tetris    Clint           2     Clint2
 ageofempires Thatcher          12 Thatcher12
    goldeneye   Dexter           2    Dexter2

希望有更优雅的东西吗?

1 个答案:

答案 0 :(得分:0)

我认为可以使用dplyr来实现。首先,使用过滤器检索游戏为goldeneye的行。然后使用inner_join使用People和HoursPlayed加入原始数据。可选:选择所需的列并按人员排列。

library(dplyr)
Gamedat %>% 
  filter(Games == "goldeneye") %>% 
  inner_join(Gamedat, by = c("People", "Hoursplayed")) %>% 
  select(Games = Games.y, People, Hoursplayed) %>% 
  arrange(People)

结果:

          Games   People Hoursplayed
1     goldeneye    Clint           2
2        tetris    Clint           2
3     goldeneye   Dexter          12
4     goldeneye   Dexter          15
5        pacman   Dexter           2
6     goldeneye   Dexter           2
7     goldeneye  Michael           5
8        pacman  Michael           5
9     goldeneye Thatcher           8
10    goldeneye Thatcher          12
11 ageofempires Thatcher          12