如何从数据框中收集列中的所有不同名称,并计算每个名称的出现次数?

时间:2015-09-11 23:10:32

标签: r count dataframe

我有一个包含6181行的数据框(大型幻想足球比赛中的每个玩家一个),此数据框中的一列有一个列出每个玩家名单的9个不同的足球运动员。< / p>

我希望R为我提供本栏目中显示的所有不同的足球运动员名称(数百个),并计算这些个人姓名出现的次数。

以下是该列中单元格的示例:

QB Dane Evans QB Jaquez Johnson RB Zack Langer RB Greg Howell WR Keyarris Garrett WR Jenson Stoshak WR Keevan Lucas FLEX Jordan Howard FLEX Sony Michel

为此我想要输出(如果我只使用1行而不是6181):

QB Dane Evans - 1

QB Jaquez Johnson - 1 

RB Zack Langer - 1

RB Greg Howell - 1

WR Keyarris Garrett - 1 

WR Jenson Stoshak - 1

WR Keevan Lucas - 1

FLEX Jordan Howard - 1

FLEX Sony Michel - 1

Or 100% instead of 1. 

我认为,大部分搜索此问题的答案,似乎都在向我展示我可以计算出以特定顺序列出的9个玩家的特定组合出现的次数,而不是个人数量所有行的名称。

1 个答案:

答案 0 :(得分:2)

我的谦虚解决方案

# Data Frame
my.players <- data.frame( name = "QB Dane Evans QB Jaquez Johnson RB Zack Langer RB Greg Howell WR Keyarris Garrett WR Jenson Stoshak WR Keevan Lucas FLEX Jordan Howard FLEX Sony Miche")

# Position dictionary. Add all positions here in that format.
pos.dic    <- c( "\ *QB\ *"
               , "\ *RB\ *"
               , "\ *WR\ *"
               , "\ *FLEX\ *"
               )

# Regex for positions
pos.regex <- paste( pos.dic, collapse = "|" )

# Remove Positions
play.names <- gsub( pattern     = pos.regex
                  , replacement = ","
                  , x           = my.players$name
                  )


# Split
play.names <- strsplit( x = play.names, split = ",") 

# Unlist
play.names <- unlist( x = play.names )

# Remove first space
play.names <- play.names[ -1 ]

# Result
[1] "Dane Evans"       "Jaquez Johnson"   "Zack Langer"      "Greg Howell"      "Keyarris Garrett" "Jenson Stoshak"   "Keevan Lucas"     "Jordan Howard"   
[9] "Sony Miche"    

然后,利用表函数,它将返回一个频率表。说明:

 ‘table’ uses the cross-classifying factors to build a contingency
     table of the counts at each combination of factor levels.

示例:

freq.table <- table(x = play.names )    
  Dane Evans      Greg Howell   Jaquez Johnson   Jenson Stoshak    Jordan Howard     Keevan Lucas Keyarris Garrett       Sony Miche      Zack Langer 
           1                1                1                1                1                1                1                1                1 

然后,如果您更喜欢百分比,请使用prop.table :):

prop.table <- prop.table( x = freq.table )

prop.table <- round( x      = prop.table * 100
                   , digits = 2
                   )

Dane Evans      Greg Howell   Jaquez Johnson   Jenson Stoshak    Jordan Howard     Keevan Lucas Keyarris Garrett       Sony Miche      Zack Langer 
           11.11            11.11            11.11            11.11            11.11            11.11            11.11            11.11            11.11